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PREFACE 


This  report  was  prepared  as  part  of  Rand’s  DoD  Training  and  Man- 
power Management  Program,  sponsored  by  the  Human  Resources  Research 
Office  of  the  Defense  Advanced  Research  Projects  Agency  (ARPA) . With 
manpower  issues  assuming  an  ever  greater  importance  in  defense  planning 
and  budgeting,  the  purpose  of  this  research  program  is  to  develop  broad 
strategies  and  specific  solutions  for  dealing  with  present  and  future 
military  manpower  problems.  This  includes  the  development  of  new  re- 
search methodologies  for  examining  broad  classes  of  manpower  problems, 
as  well  as  specific  problem-oriented  research.  In  addition  to  provid- 
ing analysis  of  current  and  future  manpower  issues,  it  is  hoped  that 
this  research  program  will  contribute  to  a better  general  understand- 
ing of  the  manpower  problems  confronting  the  Department  of  Defense. 

This  report  presents  a methodology  for  using  supervisory  evalua- 
tions of  military  personnel  in  models  of  manpower  performance.  Although 
the  measurement  of  performance  is  crucial  to  many  manpower  models,  fre- 
quently the  only  measures  available  are  those  obtained  from  supervisors. 
Past  research  has  shown,  however,  that  such  ratings  may  be  subject  to 
biases,  perhaps  unintentional,  making  it  difficult  to  determine  the 
extent  to  which  the  ratings  reflect  "true"  performance  or  the  super- 
visor's own  implicit  rating  scale. 

This  report  provides  a way  of  correcting  for  these  biases.  In 
particular,  since  researchers  may  want  to  assess  the  contribution  of 
various  factors  to  individual  performance — often  through  the  use  of 
multiple  regression  models — it  is  necessary  to  have  a method  of  aljust- 
ing  the  subjective  measure  of  performance.  The  resulting  approach  to 
the  problem — the  multi-scale  model — suggests  that  supervisory  evalua- 
tions of  individuals  are  subject  to  two  types  of  biases.  The  first  is 
the  familiar  location  bias-— that  is,  some  supervisors  may  grade  "easy" 
while  other  grade  "hard."  The  second  is  a scale  effect — that  is,  some 
supervisors  may  exaggerate  differences  among  individuals  while  others 
nay  minimize  these  differences.  The  generic  name  for  the  methodology 
presented  here — the  multi-scale  model — derives  from  the  latter  bias. 
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ThiS  research  was  motivated  by  previous  Rand  research  under  the 
DoD  Training  and  Manpower  Management  Program  that  was  concerned  with 
measuring  the  cost  of  on-the-job  training  for  first-term  enlisted  per- 
sonnel. Indeed,  the  basic  idea  was  first  sketched  out  in  Robert  M. 

Gay,  Estimaing  the  Cost  of  On-the-Job  Training  An  Military  Occupations: 
A Methodology  and  Pilot  Study , The  Rand  Corporation,  R-1351-ARPA, 

April  1974.  It  was  decided  to  extend  the  brief  discussion  of  the 
model  contained  there,  both  because  the  use  of  supervisory  ratings  is 
important  to  manpower  planners  and  researchers  in  general  and  because 
on-going  research  at  Rand  dealing  with  first-term  enlisted  personnel 
performance  requires  such  a model. 

This  report  presents  the  model  in  the  context  of  an  extension  to 
the  classical  regression  model.  The  report  is  technical  in  nature  and 

assumes  that  the  reader  has  a good  understanding  of  standard  econometric 
theory. 

Finally,  although  the  methodology  presented  here  was  originally 
developed  to  deal  with  the  supervisory  ratings  problem,  it  may  be  ap- 
plicable to  a number  of  other  econometric  problems,  such  as  seasonal 
adjustment  and  other  cases  in  which  data  fall  into  natural  groupings. 
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SUMMARY 


Subjective  evaluations  of  individual  performance,  such  as  those 
provided  by  supervisors,  may  be  subject  to  certain  kinds  of  biases. 

Yet  subjective  evaluations  are  a common  and  frequently  the  only  source 
of  information  about  a person's  performance  and  can  therefore  be  an 
important  element  in  t lie*  application  of  manpower  policy.  Furthermore , 
the  development  and  application  of  appropriate  manpower  policies  may 
depend  on  measuring  the  effects  of  specific  variables  on  individual 
performance;  it  is  therefore  important  to  correct  for  biases  in  sub- 
jective evaluations  of  individual  performance. 

Tli is  report  is  concerned  witli  the  development  of  statistical  and 
econometric  techniques  for  correcting  for  biases  in  models  of  individual 
performance.  The  approach  developed  here  is  a variant  of  the  classical 
linear  regression  model.  Specifically,  it  is  proposed  that  supervisory 
ratings  may  be  subject  to  two  types  of  bias.  The  Location  bias  results 
when  supervisors  systematically  overestimate  or  underestimate  individual 
performance.  The  ncah  bias  results  when  supervisors  exaggerate  or 
minimi/.e  differences  among  the  individuals  rated.  This  latter  effect 
gives  rise  to  the  name  of  the  model  developed  here--the  multi  ~iu;a  lc 
model.  Finally,  the  multi-scale  estimators  are  applied  to  the  problem 
noted  in  an  earlier  Rand  report  about  estimating  tin*  cost  of  on-the-job 
training  in  t he  military.  Indeed,  that  problem  was  the  genesis  of  the 
multi-scale  approach  and  illustrates  the  value  of  the  multi-scale  model. 
Although  t he  model  was  developed  to  deal  with  subjective  supervisory 
ratings,  the  multi-scale  model  may  be  applicable  to  a wide  variety  of 
other  estimation  problems  where  observations  can  naturally  be  categor- 
ized into  specific  subgroups. 

Several  specific  multi-scale  estimating  techniques  are  developed, 
including  equal  total  variance,  equal  residual  variance,  maximum  like- 
lihood, and  least  squares.  These  differ  primarily  in  the  way  the  scale 
parameters  are  estimated.  Asymptotic  results  are  derived  for  each  of 
the  four  techniques.  However,  because  of  the  difficulty  in  deriving 
small  sample  properties  analytically,  Monte  Carlo  experiments  were 
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conducted.  The  asymptotic  and  Monte  Carlo  results,  taken  together, 
suggest  some  practical  guidelines  for  estimation  of  the  multi-scale 
model.  Maximum  likelihood  and  equal  residual  variance  techniques 
yield  consistent  parameter  estimates.  However,  for  small  sample  sizes 
and  configurations  for  cases  with  large  random  errors,  the  equal  total 
variance  residual  estimator  is  preferred. 
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I_. INTRODUCTION 


Subjective  evaluations  are  an  important  element  of  military  man- 
power policy,  both  in  the  application  of  present  policies  and  in  the 
development  of  new  policies.  For  example,  measures  of  performance  in 
the  *orm  of  subjective  ratings  by  an  individual's  superior  play  a 
crucial  role  in  determining  promotions  and  duty  assignments  and  illus- 
trate the  application  of  military  personnel  policy.  Similarly,  the 
development  of  new  personnel  policies  frequently  depends  on  measuring 
the  effects  of  specific  factors  on  individual  performance. 

Although  subjective  evaluations  are  clearly  an  important  input 
to  manpower  policy,  these  measures  have  certain  inherent  difficulties. 

In  particular,  they  are  likely  to  reflect  the  biases  of  those  provid- 
ing the  ratings.  In  some  instances,  these  biases  may  be  deliberate 
and  applied  only  selectively  (e.g.,  because  of  personality  conflict 
between  the  rater  and  ratee)  and  cannot  therefore  be  properly  con- 
trolled for.  It  is  probably  more  common,  however,  for  these  biases 
to  be  unintentional  and  systematically  applied,  a result  of  the  fact 
that  raters  may  use  different  implicit  rating  scales  or  may  perceive 
matters  differently.  Some  raters  may  consistently  grade  "easy"  or 
"tough." 

This  report  develops  a methodology — the  multi-scale  model  and  its 
corresponding  estimators— for  estimating  the  systematic  biases  inherent 
in  the  subjective  measures  (of  such  variables  as  individual  performance) 
that  are  often  used  in  the  development  and  application  of  manpower  policy. 
Specifically,  it  is  argued  that  subjective  measures  of,  say,  individual 
performance  may  include  two  types  of  biases.  The  first,  the  location 
bias,  is  the  familiar  problem  that  occurs  when  some  raters  systemati- 
cally overestimate  and  others  systematically  underestimate  the  "true" 
variable.  The  second,  the  scale  bias,  occurs  when  some  raters  exag- 
gerate the  differences  among  those  who  are  rated  while  other  raters 
minimize  these  differences. 

The  approach  adopted  here  incorporates  these  biases  into  the  tra- 
ditional classical  regression  model.  However,  the  presence  of  the 
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scale  bias  invalidates  the  standard  estimating  techniques,  so  that  it 
becomes  necessary  to  develop  special  multi-scale  estimators.  The 
practical  importance  of  the  multi-scale  model  and  estimators  is  two- 
fold. The  techniques  provide  a way  of  properly  estimating  such  param- 
eters of  the  underlying  model  as  the  effects  of  education,  military 
training,  mental  aptitude,  etc.  on  individual  performance.  The  model 
also  enables  the  analyst  to  construct  "corrected"  measures — adjusted 
for  the  inherent  biases  of  the  subjectively  estimated  variables. 

In  the  next  section,  we  provide  a brief  discussion  of  the  origin, 
structure,  and  applications  of  the  multi-scale  model.  Section  III  dis- 
cusses some  basic  issues  in  estimating  the  multi-scale  model  and  sug- 
gests and  derives  five  specific  estimating  techniques.  Section  IV 
examines  the  mathematical  and  statistical  properties  of  the  estimates. 
Since  the  small-sample  superiority  of  any  of  the  estimates  cannot  be 
proved,  we  have  conducted  an  extensive  series  of  Monte  Carlo  experi- 
ments involving  the  principal  estimation  techniques.  The  results  of 
these  experiments  are  reported  in  Section  V.  Section  VI  applies  the 
multi-scale  model  to  the  supervisory  rating  problem  discussed  earlier. 
Section  VII  outlines  possible  extensions  of  the  model. 
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II.  THE  MULTI-SCALE  MODEL 


The  multi-scale  model  is  a variant  of  the  classical  linear  regres- 
sion model  in  which  the  dependent  variable  is  subjected  to  a linear 
transformation  that  varies  from  group  to  group.  Thus,  for  the  ith 
observation  the  observed  dependent  variable  y is  related  to  the  "true" 
dependent  variable  by 


yu  ' + Vu 


(2.1) 


The  model  is  multi-scale  in  that  the  location  parameter  ot^  and  the  scale 
parameter  6j  may  take  on  different  values  in  each  of  the  J subsets  into 
which  the  observations  are  partitioned.  The  value  of  the  unobserved  de- 
pendent variable  is  determined  by  the  classical  model 


Zij  " Xij6  + eij  ’ 


(2.2) 


where  X is  a vector  of  independent  variables  and  the  E . a set  of  in- 

^ 2 t-i 

dependent  random  variables  with  mean  y and  variance  a . The  problem  is 

to  estimate  the  three  vectors  of  parameters,  a,  3,  and  6.  Inasmuch  as 

only  scale  effects  are  being  investigated,  all  values  of  6 are  assumed 

J 

to  be  strictly  positive.  The  full  model  can  be  written 


yij  = “j  + Xij(6j6)  + (<SiEij)  ‘ (2.3) 

Classical  regression  analysis  has  been  extended  to  a number  of 
cases  in  which  the  coefficients  may  vary  in  some  fashion  across  subsets 
of  observations.  It  is  standard  practice,  for  instance,  to  use  separate 
intercept  terms  or  separate  coefficients  for  subsets  of  observations 
sharing  some  common  attribute.  Indeed,  whole  sets  of  procedures,  known 
generally  as  analysis  of  covariance,  have  been  devised  for  determining 
whether  sets  or  subsets  of  coefficients  differ  among  subsets  of  observa- 
tions. (See  Chou  (1]  and  Johnston  [2].)  In  a related  area  the  pooling 
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of  cross-section  and  time  series  data  poses  a number  of  estimation 
problems  that  1 ave  been  exhaustively  analyzed  in  the  literature  on 
error  components  models.  (See  Wallace  and  Hussain  [3],  Ballestra  and 
Nerlove  [4],  Nerlove  [5],  [6],  and  Hsiao  [7].)  This  report  represents 
an  extension  of  the  literature  on  different  coefficients  and  intercept 
terms  to  a case  where  the  coefficient  vector  may  differ  by  a scale 
factor  among  subsets  of  observations.  The  parameter  vector  y^  * 
in  Eq.  (2.3)  may  take  on  different  values  for  each  subset  of  observa- 
tions j;  however,  unlike  the  case  of  pooling  data,  the  parameter  vec- 
tors differ  by  a scale  factor  rather  than  being  identical  or  totally 
different.  Because  the  6 is  a multiplier  of  6 as  well  as  the  e , 

J -*  J 

the  problem  is  more  than  a problem  of  heteroscedasticity.  Hartley  and 
Jayatillake  [8]  have,  in  fact,  analyzed  the  case  where  the  variance  of 
the  error  term  may  differ  by  subset. 

The  problem  is  also  more  than  a nonlinear  regression  problem, 
since  in  its  conventional  interpretation  the  nonlinear  regression  prob- 
lem can  be  written  as 


y - g(x,0)  > e , (2.4) 

whereas  (2.3)  can  only  be  written  as 

f(y,x,0)  =■  e . (2.5) 

The  problem  created  by  (2.3)  is,  strictly  speaking,  a multi-scale  prob- 
lem. We  believe  that  the  multi-scale  model  has  considerable  applica- 
bility in  economics  and  the  social  sciences  in  analyzing  data  containing 
rating-scale  phenomena,  in  analyzing  pooled  cross-section  and  time  series 
data,  and  in  analyzing  time  series  data  involving  subannual  observations. 

Because  to  the  best  of  our  knowledge  this  model  has  not  been  ana- 
lyzed previously,  this  report  derives  and  discusses  a variety  of  estimat- 
ing techniques  and  suggests  guidelines  for  using  the  various  estimates. 
Although  some  guidance  is  obtained  from  asymptotic  properties  of  the 
estimates,  our  recommendations  are  principally  based  on  the  results  of 
a series  of  Monte  Carlo  experiments.  We  do  not  explore  in  any  detail 
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the  properties  of  the  estimates  or  even  the  existence  of  estimates. 
Rather  it  is  our  desire  to  focus  on  the  multi-scale  model  itself,  on 
practical  problems  of  one  choice  and  computation  of  estimates,  and  on 
the  application  of  the  model  to  problems  of  statistics  and  econometrics. 
We  have  attempted  to  minimize  the  mathematical  derivation  and  stream- 
line whatever  proofs  are  given.  This  last  is  at  the  expense  of  mathe- 
matical rigor  but  in  keeping  with  the  general  nature  of  the  report. 

The  empirical  problem  that  led  us  to  estimate  the  multi-scale 
model  arose  in  a study  of  on-the-job  training  in  the  Air  Force.  Pro- 
ductivity indices  were  created  for  individual  airmen  on  the  basis  of 
quantitative  but  somewhat  subjective  information  provided  by  the  air- 
man's supervisor.2  Multiple  observations  were  available  from  individual 
supervisors.  The  parameters  Ot^  and  6^  in  (2.1)  reflect  the  fact  that 
each  supervisor  apparently  used  a different  rating  scale.  Moreover, 
these  differences  were  reflected  in  the  mean  scores  (a.)  and  in  the 
standard  deviations  (6j)  of  the  subsamples.  Figure  1 plots  the  cost  of 
on-the-job  training  (OJT)  for  individual  airmen  grouped  under  the  12 
supervisors  in  the  sample.  The  standard  deviations  of  OJT  costs  range 
from  $214  to  $4297  across  the  12  supervisors.  Since  supervisors  typi- 
cally oversee  small  numbers  of  individuals,  statistical  analysis  of  the 
data  is  impossible  unless  data  from  different  supervisors  are  pooled 
together.  Consequently,  it  was  necessary  to  combine  a rating-scale 
model  (2.1)  with  a behavioral  model  (2.2).  We  would  expect  that  sta- 
tistical inference  involving  any  variables  containing  rating-scale 
phenomena  would  give  rise  to  the  multi-scale  model.  Similar  applica- 
tions could  relate  to  personnel  evaluations,  classroom  performance,  or 
other  situations  where  personal  ratings  might  be  used. 

Economists  have  only  recently  and  then  infrequently  come  to  use 
data  based  on  ratings.  As  a result,  the  most  useful  application  of  the 
multi-scale  model  for  economists  may  be  in  the  area  of  pooled  cross- 
section  and  time  series  data.  Pooled  data  usually  involve  combining 


^See  Gay  [9]  and  Gay  and  Nelson  [10]. 

2The  cost  of  OJT  for  each  airman  was  estimated  as  the  difference 
between  the  airman's  productivity,  which  vr j provided  by  his  super- 
visor, and  his  wages. 
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time  series  data  on  firms,  states,  or  countries.  If  the  dependent 
variable  is  in  the  form  of  an  aggregate  quantity,  such  as  fuel  con- 
sumption or  liquid  asset  balances,  there  will  be  huge  scale  differences 
among  the  different  units  combined  in  the  analysis.  The  theoretical 
model  may  express  the  quantity  demanded  in  the  ith  market  at  time  t as 
a function  of  prices  and  income  distribution  parameters  in  the  market: 

K 

qit  ^o  + ®kPitk  + ^K+lMit  + \+2°it  + Eit  ’ 

where  q = quantity  demanded  in  the  ith  market  in  period  t, 

Pitk  * price  of  commodity  k in  the  ith  market  in  period  t. 

Pit  = mean  income  of  potential  buyers  in  the  ith  market  in 
period  t,  and 

2 

ait  = variance  of  income  of  potential  buyers  in  the  ith  market 
in  period  t. 

If  the  analyst  Is  a good  theorist,  he  may  be  able  to  specify  a priori 
the  relevant  variables,  the  functional  form,  the  distribution  of  e 
and  whatever  dynamic  properties  the  demand  may  exhibit.  The  scaling 
problem  is  often  handled  by  using  another  variable  either  to  scale  qi{. 
by  defining  a new  variable  'lit/xlt  or  to  use  it  in  a weighted  regres- 
sion.1 Only  rarely,  if  ever,  is  there  theoretical  justification  for 
the  choice  of  such  a variable. 

The  application  of  the  multi-scale  model  could  eliminate  the  need 
to  specify  an  artificial  scaling  variable  xi{.  by  estimating  the  appro- 
priate scale  for  each  firm  or  state  as  well  as  the  parameter  vector  B- 
Alternatively,  one  might  use  dependent  variables  of  the  form  Aq/q  or 
In  q to  take  care  of  the  scaling.  But  unless  such  a functional  rela- 
tionship is  suggested  by  theoretical  considerations,  these  measures  may 
be  just  as  artificial  as  the  choice  of  x^.  Consequently,  the  multi- 
scale model  may  be  a good  substitute  for  several  conventional  practices 

To  be  sure,  economists  have  paid  considerable  attention  to  the 
properties  of  e^,  particularly  Eee',  in  estimating  behavioral  equa- 
tions from  pooled  data. 
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in  pooled  data  where  scale  problems  exist.  Moreover,  even  where  scale 
problems  are  only  suspected,  the  multi-scale  model  can  be  used  to  test 
for  the  presence  of  scale  effects.  This  would  provide  another  tool  in 
the  kit  to  determine  to  what  extent  sets  of  coefficients  are  the  same 
or  different.  The  multi-scale  model  permits  coefficients  to  differ 
but  remain  proportional  across  sets  of  observations. 

A third  application  of  the  multi-scale  model  would  be  in  demand 
or  supply  models  estimated  from  quarterly  or  monthly  data  where  there 
is  a strong  annual  cycle.  If  it  is  expected  that  the  parameters  6 and 
the  variance  of  the  error  are  also  subject  to  the  cycle,  the  multi- 
scale model  can  be  used.  This  could  be  written 


qjt  = “j  + l WV  + V ■ (2.7) 

where  j is  the  month,  t is  the  year,  and  is  a dummy  variable  for 
month  i.  Examples  of  data  subject  to  strong  seasonal  fluctuations 
would  include  grain  sales,  military  enlistments,  heating  fuel  consump- 
tion, and  number  of  new  entrants  to  the  labor  market.  With  strong 
seasonal  fluctuations,  better  estimates  of  8 can  often  be  made  by  using 
annual  data  instead  of  monthly  or  quarterly  data.  The  absence*  of  a 
long  time  series  of  data  or  the  presence  of  structural  changes  in  the 
market  often  make  the  use  of  subannual  data  necessary.  Moreover,  the 
seasonal  pattern  itself  is  often  of  interest.  The  estimates  a and  8 

from  (2.7)  can  be  used,  in  fact,  to  construct  seasonally  adjusted 
variables. 

There  are  probably  other  applications  for  the  multi-scale  model, 
such  as  in  estimating  age-earnings  profiles  where  the  effects  of  educa- 
tion and  ability  differ  by  age;  however,  applications  for  time  series 
and  pooled  cross-section  and  time  series  data  would  seem  to  be  the  most 
likely  uses  for  economists. 
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III.  ESTIMATION  OF  THE  MULTI-SCALE  MODEL 


The  assumptions  of  the  multi-scale  model  basically  reflect  the 
assumptions  of  the  classical  normal  linear  model.  In  particular,  we 
assume 


Yij  " “j  + 5j  (Xij 6 + £ij ) * 


i 

j 


1 > • • • * T 
1 , . . . , J 


j ’ 


(3.1) 


where  the  are  independent  random  variables  each  distributed  N(O,0  ; 


• X^^)  is  a vector  of  known  constants  of  dimension  K. 


Xij  = (Xijl* 

The  T cases  or  observations  (T  ■ £ T ) are  partitioned  into  J subsets, 
as  indicated.  The  vectors  a = (a^,  ....  oij),  6 = (6  , ...,  6j)  , and 
B - (6r  ....  Bk)  are  fixed  unknown  parameters. ^ The  only  restriction 
is  that  6,  > 0 for  all  j. 

j 

Equation  (3.1)  does  not  constitute  a complete  model.  In  particu- 

* a * /v  *2  *2  2 

lar,  estimates  of  the  fom  6 = k6,  B = B/k  and  O = o / k are  observ- 

ationally  equivalent  as  k varies.  To  identify  these  parameters,  the 
multi-scale  model  requires  an  additional  condition  on  the  set  of  param- 
eter vectors.  It  seems  most  natural  to  place  some  restriction  on  the 
vector  6,  and  we  deal  with  the  strictly  separable  function 


G(6)  = G1(61)  + ...  + Gj(6j)  - 0 


(3.2) 


as  a basis  for  identifying  6,  B,  0 . Assuming  that  the  geometric  mean 
of  the  6j's  overall  observation  is  1 leads  to 


(3.3) 


Hsiao  [7]  has  analyzed  the  error  components  problem  as  a random 
coefficients  model.  The  treatment  of  ot  and  6 as  random  variables  may' 
prove  to  be  a fruitful  approach;  however,  in  this  report  a and  6 are 
fixed  parameters. 
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In  the  case  of  the  supervisory  ratings  application,  e.g.,  (3.3)  requires 
that  the  supervisors  provide  on  average  unbiased  ratings,  in  the  geo- 
metric sense  of  the  term  "on  average." 

The  choice  of  constraint  is  a matter  of  some  importance.  For  in- 
stance, either  changing  the  weights  T or  introducing  the  "bias"  by 
setting  £ Tj  In  6^  = a,  a ^ 0 will  change  both  the  absolute  and  rela- 
tive values  of  the  parameters  6,  6,  a2.  Hence  the  identifying  restric- 
tion is  in  every  sense  an  Integral  part  of  the  model. 

A similar  identification  problem  may  arise  between  the  location 
parameter  and  the  intercept  term  6Q,  if  one  exists.  We  assume,  however, 
that  the  equation  z ^ = Xij®  ®oes  through  the  origin.*  There  is  no 
possible  confounding  of  intercept  terms  in  this  case.  The  identifica- 
tion of  intercept  terms  is  usually  less  important  to  the  analyst  than 
the  identification  of  coefficients. 


ESTIMATION  STRATEGIES  AND  TECHNIQUES 

The  analyst  may  choose  any  of  a variety  of  strategies  in  estimat- 
ing the  behavioral  parameters  6 of  the  multi-scale  model.  He  may  choose 
to  estimate  a,  6,  and  6 together  by  applying  maximum  likelihood  (ML) 
estimation,  least  squares  (LS)  estimation,  or  some  other  technique  pro- 
ducing simultaneous  estimates  of  all  parameters.  Such  a strategy  in- 
variably requires  iterative  methods  of  estimation  and  possibly  requires 


desires 


Note  that  it  is  necessary  to  estimate  both  the  otjs  and  B if  one 
° “ "corrected"  measure  of  the  dependent  variable,  since^ 


so  that 


<ylJ  ' 


(),ij  - V/5J 


A plausible  assumption  would  be  that  ctj  equals  zero  on  average  (where, 
in  this  Instance,  we  mean  arithmetically  "on  average"),  so  that  the 
side  condition  for  (Xj  becomes 


l Vj  = 0 • 

In  the  remainder  of  this  report,  we  will  assume,  without  loss  of  gen- 
erality, that  equals  zero. 


I 


the  construction  of  special  software  packages.  A simpler  strategy 
would  be  to  try  to  adjust  y^  for  the  effects  of  otj  and  6^  prior  to 
estimating  0.  One  method  of  doing  this  is  to  standardize  the  y in 
different  subsets  for  means  and  variances  and  the'  to  regress  the  ad- 
justed variable  £ on  X by  means  of  ordinary  least  squares  (OLS).  The 
apparent  advantage  is  in  the  costs  of  estimation,  and  the  apparent 
sacrifice  is  in  not  using  information  on  X and  3 in  developing  esti- 
mates of  a and  6.  A third  strategy,  of  course,  would  be  to  ignore  6 
entirely  and  estimate  the  parameters  a and  3 using  OLS.1  We  have 
already  assumed  that  the  scale  parameters  are  distributed  around  1.0. 

This  last  strategy  is  the  one  implicitly  adopted  in  pooled  regressions 
where  the  multi-scale  model  is  not  used.  In  summary,  the  three  esti- 
mation strategies  are 

I.  ML,  LS  estimates  of  a,  3,  6 

II.  OLS  estimates  of  6 with  "adjusted"  y 

III.  OLS  estimates  of  a,  3 

Although  strategies  II  and  III  may  have  the  meager  appearance  of 
straw  men,  there  is  no  guarantee  that  adopting  the  more  elaborate  ap- 
proach of  strategy  I uniformly  produces  the  best  results.  Figure  2 
shows  the  strategy  and  technique  producing  minimum  mean-square  error 

in  estimates  of  0 in  one  series  of  Monte  Carlo  experiments  conducted 

2 

for  this  study.  Individual  experiments  differ  according  to  R (the 
coefficient  of  determination)  and  the  variance  of  In  6 in  the  multi- 
scale model.  Perhaps  surprisingly  each  strategy  offers  a region  of 

superiority.  Where  all  <5  values  are  near  unity,  strategy  III  is  superior 

2 

in  that  it  is  better  to  ignore  6 than  try  to  estimate  it.  Where  the  R 
is  small  (less  than  .30)  and  where  the  X have  similar  distributions 
for  different  values  of  j,  strategy  II  is  superior.  There  apparently 
is  little  error  due  to  standardizing  values  of  y In  this  homogeneous 
case.  However,  where  the  X^  have  quite  dissimilar  distributions 


Actually,  the  third  strategy  consists  of  OlS  with  dummy  vari- 
ables for  a (since  a fourth  strategy  could  be  to  ignore  both  the  a 
and  3) . 


1 


Fig. 2 — Regions  of  superiority  for  different  estimators 
of  the  multi -scale  model 
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across  subsets  (not  shown),  this  region  of  superiority  is  sharply  re- 
duced. Since  strategy  II  ignores  the  effects  of  X and  0 in  estimating 
6,  its  value  depends  crucially  on  the  homogeneity  of  the  across 
subsets.  In  the  remaining  aieas  of  Fig.  2,  strategy  I provides  better 
estimates  of  0. 

We  derive  and  evaluate  five  estimating  techniques  for  the  multi- 
scale  model:  (1)  OLS;  (2)  the  technique  for  adjusting  the  dependent 

variable  before  using  OLS,  referred  to  here  as  the  equal  total  vari- 
ance (ETV)  technique;  (3)  ML  estimation;  (4)  LS  estimation;  and  (5) 
another  simultaneous  technique  that  determines  6 such  that  there  is 
equal  residual  variance  (ERV)  across  subsets.  The  first  four  have 
been  mentioned  previously;  the  last  technique  is  heuristically  deter- 
mined based  on  the  expectation  that,  if  6 is  controlled  for,  the  vari- 
ance of  the  residuals  should  be  approximately  equal  across  subsets  of 
observations. 

Each  of  the  five  techniques  provides  four  sets  of  conditions,  which 

2 

can  be  associated  with  the  parameters  a,  0,  6,  and  a . If  treatment  of 

2 2 

degrees  of  freedom  is  standardized,  the  conditions  for  a,  0,  and  O are 
identical  for  each  of  the  five  estimating  techniques.  The  details  for 
ML  and  LS  estimates  are  provided  in  the  appendix.  Thus,  as  in  the  case 
with  OLS,  the  intercept  terms  a can  be  estimated  after  the  other  param- 
eters, because  the  following  equation  defines  the  estimator  of  6^  for 
all  methods. 


A 


- Vj6 


(3.4) 


where  X is  the  mean  for  subgroup  j.  Hereafter,  in  fact,  we  eliminate 
a from  the  model  by  redefining  y^  and  X^  as  y^  - and  X^  - X ^ . 
Thus,  the  multi-scale  model  is 


'ij 


VV 


+ ey> 


(3.5) 


XThus , techniques  (3),  (4),  and  (5)  belong  to  broad  strategy  I 
outlined  earlier. 

2Maximum  likelihood  estimates  do  not  provide  for  adjustments  for 
degrees  of  freedom.  Here,  as  in  other  applications,  we  use  ML  esti- 
mates for  degrees  of  freedom. 
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A 

The  condition  for  8 is 


6 = (X'X)'1  X'z  , 


(3.6) 


where  = yij^fij'  Thus  in  a11  cases  the  condition  for  8 is  simply 
the  OLS  regression  of  z on  X. 

Finally,  in  each  case  we  choose 


-2 

a 


(3.7) 


where  2J  + K represent  the  total  number  of  parameters  in  a,  8,  and  6. 

The  only  differences  among  the  five  estimating  techniques  are  the 
conditions  associated  with  estimates  of  6.  These  are  listed  below,  be- 
ginning with  ML,  T,S,  and  ERV,  the  three  simultaneous  techniques.  The 
parameter  A in  ML  and  LS  is  a Lagrange  multiplier  attached  to  the  side 
condition.  The  results  are  for  the  general  form  of  the  side  condition 

(3.2).  In  particular,  for  each  j = 1,  ...,  J, 


(3.8.1) 


(3.8.2) 


(3.8.3) 


(3.8.4) 


6 = 1. 


OLS: 


(3.8.5) 
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The  conditions  for  ERV,  i V,  and  OLS  follow  directly  from  the  defi- 
nitions of  the  techniques.  ERV,  for  instance,  required  the  variance  of 
the  residuals  to  be  equal  in  each  subset,  and  ETV  requires  the  variance 
of  the  adjusted  dependent  variable  to  be  equal  across  subsets,  while  OLS 
merely  accepts  6 as  a constant.  The  conditions  for  ML  and  LS,  however, 
are  taken  from  the  first-order  conditions  for  the  maximization  or  min- 
imization carried  out  in  the  two  techniques.  The  derivations  appear 
in  the  appendix.  LS  and  ML  estimates  are  formulated  as  Lagrange  multi- 
plier problems  because  they  represent  the  extrema  of  function  subject 
to  a single  constraint.  The  conditions  for  LS  and  ML  both  involve  the 
cross-product  between  the  residual  e^.  = (y^./S.  - X^g)  and  the  ad- 
justed dependent  variable  within  each  subset.  The  cross-product  is 
- 2 

equal  to  0 for  ML  estimates  and  to  zero  for  LS  estimates,  once  com- 
pensation has  been  made  for  the  side  condition.  Under  side  condition 

(3.3)  witli  weights  T - 1 instead  of  T.,  the  expression 

1 J 


3 G 


. _ 


36.  T.  - 1 
.1  J 


= 1 


(3.9) 


This  results  in  numerically  identical  estimates  for  ML  and  LS . As  is 
shown  below,  this  is  the  only  such  side  condition  that  produces  identi- 
cal results  for  ML  and  LS  estimates. 

The  third  technique,  ERV,  requires  that  the  residual  variance  be 

'-2 

equal  (to  o ) across  subsets.  This  condition  is  superficially  similar 
to  (3.8.1)  and,  in  the  limit,  ML  and  ERV  produce  identical  results. 
Equations  (3.8.4)  and  (3.8.5)  are  the  conditions  for  ETV,  based  on  the 
adjusted  dependent  variable,  and  OLS.  Neither  condition  uses  the  full 
information  of  the  model  and  can  produce  efficient  estimates  of  8 only 
in  some  very  special  circumstances. 


THE  EXISTENCE  OF  SOLUTIONS  AND  A METHOD  OF  COMPUTATION 

The  system  of  normal  equations  produced  by  any  of  the  simultaneous 

estimation  techniques  does  not  yield  a closed  form  solution.  For  the  ML 

normal  equations  there  is  a unique  solution  where  all  6 > 0 if  and  only 

*■  i 
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if  the  data  matrix  obeys  some  quite  reasonable  conditions.  More  gen- 
erally there  are  2^  solutions  to  the  system  of  ML  equations.  Only  one 
solution  will  have  all  values  of  the  > 0.  The  2^  solutions  may  be 
thought  of  as  J pairs  of  values  for  6 , A + | B | and  A - |B  |,  where 
| B J > A . There  are  2 combinations  hut  only  one  where  6 > 0. 

J J J 

The  necessary  and  sufficient  conditions  for  this  result  can  be 
simply  stated.  Let  Y represent  the  T * J matrix,  which  places  values 
of  in  separate  columns  according  to  subgroup.  Define  Q = Y'(I  - 
X^'Xj'lx'  )Y,  where  M = I - X(X,X)_1X'  is  an  idempotent  matrix.  Then 
the  necessary  and  sufficient  condition  for  a unique  positive  solution 
for  the  ML  equation  is  that 


det  | Q | 


or  that  Q be  of  full  rank  (rank  - J).  This  condition  will  not  hold  if 

(1)  the  columns  (variables)  are  not  linearly  independent;  (2)  there  is 
a perfect  fit  between  Y^  and  X^  for  all  observations  in  any  subgroup; 
(3)  there  is  no  variation  in  Y^  in  any  subgroup;  and  (4)  there  is  only 
one  observation  in  any  subgroup.  Assuming  the  existence  of  a solution 
in  the  ML  case  requires  (1)  a proper  specification  of  the  variables  and 

(2)  the  elimination  of  any  subgroup  satisfying  any  of  the  conditions 
(2)  - (4). 

To  compute  the  estimates  of  the  multi-scale  model,  we  have  developed 
an  iterative  approach  that  converges  rather  quickly  to  a set  of  param- 
eters satisfying  (3.4),  (3.5),  (3.6),  (3.7)  and  one  of  the  conditions 
(3.8.1) — (3.8.3).  This  is  an  approximate  solution  to  the  system  of 
equations,  but  as  indicated,  only  one  of  several  possible  solutions 
where  the  system  is  quadratic.  Negative  roots  of  the  quadratic  have 
been  eliminated  since  these  produce  negative  estimates  of  6^ . This 
computation  procedure  has  been  applied  in  literally  thousands  of  re- 
gressions in  the  Monte  Carlo  experiment  and  in  no  case  did  it  yield 
unreasonable  or  outlandish  results. 


1 


This  result  was  provided  by  Gus  C. 


Haggstrom. 
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The  estimating  procedure  can  be  described  in  six  steps  beginning 
with  raw  data,  not  grouped  by  subset.  Documentation  of  this  program 
has  been  provided  by  Smith  [11].  The  steps  are  as  follows: 

1*  RaW  data  are  ordered  by  subgroup.  Sample  means  are  calcu- 
lated by  subset  for  the  dependent  and  independent  variates, 
and  these  variates  are  then  expressed  in  terms  of  deviations 
from  the  subgroup  means.  (This  permits  the  vector  of  inter- 
cept terms  to  be  estimated  after  all  other  parameters  have 
been  estimated. ) 

2.  Initial  trid  values  of  the  ^ are  obtained  from  the  standard 
deviations  of  y±J  in  each  subgroup.  These  are  normalized  to 
conform  to  the  logarithmic  constraint  (3.3).  An  "adjusted" 
dependent  variable  is  found  by  dividing  y by  the  estimate 
of  6j.  (This  is  simply  the  ETV  procedure.  If  ETV  estimates 
are  desired,  it  is  necessary  only  to  calculate  the  OLS  based 
on  the  adjusted  dependent  variable.) 

3.  Initial  estimates  of  8 are  obtained  by  regressing  the  adjusted 
dependent  variable  on  the  independent  variates. 

4.  Given  the  estimates  of  8,  new  estimates  of  6 are  obtained  (and 
the  Lagrange  multiplier  \ where  applicable).  A gradient  search 
technique  (Newton's  method)  is  used  to  find  the  appropriate 
Lagrange  multiplier.  Acceptable  accuracy  can  usually  be  found 
within  about  five  iterations.  Given  the  proper  A,  the  values 

of  6 can  be  calculated  directly  from  the  J equations  involv- 
ing 6. 

5.  Steps  (3)  and  (4)  are  repeated  until  the  values  of  8 and  6 con- 
verge. A criterion  is  used  that  the  maximum  change  in  any  6 
whi  ; is  the  most  sensitive  parameter,  must  be  less  than  .001. 
Usually,  fewer  than  four  iterations  are  required. 

6.  Given  values  of  8 and  5,  estimates  of  a can  be  calculated. 

Data  processing  has  been  performed  on  the  IBM  370/158.  The  average 
cpu  time  per  estimate  has  been  4 seconds  with  250  total  observations,  50 
subgroups,  and  two  independent  variables  for  the  ML  technique.  Average 
cpu  for  the  ETV  technique  (basically  a single  regression)  is  1.1  seconds. 
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IV.  PROPERTIES  AND  CHARACTERISTICS  OF  THE  ESTIMATES 


Ideally  we  would  like  to  be  able  to  derive  the  small-sample  dis- 
tributions of  the  five  estimating  techniques  and  base  our  choice  of 
estimates  principally  on  these  theoretically  determined  distributions. 
Only  large-sample  properties  and  distributions  of  the  estimates  can  be 
examined;  consequently,  the  ultimate  basis  for  our  choice  of  estimates 
will  be  the  Monte  Carlo  experiments  of  Section  V.  An  analysis  of  the 
statistical  properties  and  mathematical  characteristics  of  the  esti- 
mates contributes  to  an  understanding  of  the  estimation  problem  and 
provides  more  guidance  in  the  choice  of  estimates. 

This  section  is  devoted  to  two  topics:  (1)  the  consistency  and 

asymptotic  normality  of  the  estimates  and  (2)  asymptotic  variance  of 
the  estimates. 


CONSISTENCY  AND  ASYMPTOTIC  NORMALITY 

The  characteristics  of  the  multi-scale  model  do  not  lend  them- 
selves to  a mathematical  analysis  of  small-sample  properties.  Neither 
unbiasedness  nor  minimum  variance,  for  instance,  c^n  be  demonstrated 
for  finite  sample  sizes.  We  must  restrict  ourselves  to  the  asymptotic 
properties.  In  the  multi-scale  problem,  where  observations  are  grouped 
into  subsets,  the  question  of  consistency  is  complicated  considerably 
by  the  fact  that  sample  size  can  be  increased  by  increasing  the  number 
of  observations  per  subset  (T^ ) or  by  increasing  the  number  of  subsets 
J,  or  both.  When  sample  size  is  increased  by  increasing  the  number  of 
subjects  J,  however,  the  number  of  parameters  to  be  estimated  also  in- 


creases— two  new  parameters  for  each  new  subset. 

In  general,  we  note  that  if  £ is  a parameter  vector  of  m elements, 

A 

then  £ is  said  to  be  a consistent  estimator  of  £ if 


plim  £ = £ 
n/m  -*•  00 
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That  Is,  merely  assuming  that  n -*•  00  may  not  be  sufficient  to  guarantee 
that  a probability  limit  exists  for  the  £ vector}  This  takes  on  a 
special  importance  for  the  multi-scale  model  since  m = 2J  + K,  where  K 
is  the  number  of  explanatory  variables  in  the  model.  Therefore,  since 
n = T^  • J (if  the  subgroup  size  is  the  same  for  all  subsets),  then 
n/m  approaches  T^/2  in  the  limit  as  n is  increased  by  increasing  J. 

The  implication  of  this  is  that  when  sample  size  is  increased  by  add- 
ing more  subsets,  the  probability  limit  as  n -►  113  for  the  parameter 

/\  A AAA 

vector  0,  where  G = (B,  a,  6),  does  not  exist. 

A 

The  probability  limit  for  the  entire  0 vector  exists  only  when 
T.  ► °o  for  each  subgroup  so  that  we  can  speak  of  consistency  only  when 
the  number  of  observations  per  subset  increases  without  bound.  That 
is,  the  "large  sample"  in  the  multi-scale  model  means  many  observations 

r\ 

per  subset.  this  is  unfortunate  in  a way  because,  as  the  number  of 
observations  per  subset  grows,  the  need  to  pool  data  from  different 
rating  systems  diminishes.  Moreover,  practical  limitations  may  require 
that  additional  observations  be  created  through  increasing  the  number 
of  subsets  rather  than  their  thickness.  Thus,  one  can  add  supervisors 
to  the  sample  but  not  necessarily  the  number  of  cases  each  supervisor 
evaluates.  Monte  Carlo  experiments  must  be  used  to  assess  the  estimat- 
ing techniques  under  different  sample  configurations. 


ML  Estimates 

Under  some  very  general  conditions  maximum  likelihood  estimates 
are  consistent,  jointly  asymptotic  normal , and  jointly  asymptotically 


^An  extreme  example  is  the  problem  of  estimating  n means  with  n 
observations  posed  by  Kendall  a. id  Stuart  [12|,  p.  hi;  and  /('liner  [1  ’I, 
p . 114. 

"It  is  important  to  note  t ia'  while  the  probability  limit  lot  the 
0 vector  may  not  exist  for  .1  * , the  probability  limit  lor  6 may  very 

well  exist  for  J - ®.  However,  tv  cause  we  rely  on  the  consistency  of 
a and  6 to  show  the  consistent'  c B,  we  cannot  show  tin*  consistency 
of  B when  I * °°.  Nevertheless,  t e Monte  Carlo  results  in  Section  V 
suggest  that  the  marginal  dist' ib  t i on  for  B may  converge  when  I • 
holding  subset  size  constant.  it  s has  the  important  implication  that 
when  one  can  increase  sample  s . re  only  through  the  addition  oi  mote 
subsets,  one  can  get  more  prec  . st  estimates  of  B — the  parameter  vector 
likely  to  be  of  most  concern  t»  t e analyst— even  though  the  estimates 
of  a and  6 are  not  consistent. 
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efficient.  Hoadley  [14]  and  Bradley  and  Gart  [15)  have  considered  the 
case  of  independent,  not  identically  distributed  random  variables, 
such  as  Y±J  in  the  multi-scale  model.  The  authors  conjecture,  but  have 
not  verified,  that  with  appropriate  restrictions  on  the  constants  X 
the  multi-scale  model  satisfies  the  conditions  of  Hoadley  and  of  Bradley 
and  Gart;  and  the  ML  estimates  are  consistent,  jointly  asymptotically 
normal,  and  jointly  asymptotically  efficient. 

Furthermore,  the  satisfaction  of  certain  necessary  conditions  for 
consistency  can  be  proved  directly.  As  indicated  previously,  there  is 
a unique  solution  to  the  ML  equations  with  all  6^  > 0 under  rather  weak 
restrictions.  This  solution  is  the  ML  estimate.  This  result  will  also 
hold  in  the  limit  as  all  T^  ->  Moreover,  it  can  be  shown  that  the 
set  of  normal  equations  for  the  ML  estimates  in  the  limiting  case  have 
the  following  solution: 


plim  8-8 
plim  6=6 
plim  a =a 
plim  \ = 0 . 

This  implies  that  ML  equations  in  the  limiting  case  yield  the  true 
parameter  values  as  a solution.  This  line  of  reasoning  does  not  fully 
establish  the  consistency  of  ML  estimates,  since  we  have  not  demon- 
strated the  existence  of  a sequence  of  values  of  the  parameter  vector 
0 for  which  0 is  the  limit;  however,  we  are  reasonably  confident  in 
our  conjecture  that  ML  estimates  are  consistent. 


LS  Estimates 

We  have  already  made  reference  to  the  fact  that  least  squares  esti- 
mates are  identical  to  ML  estimates  under  side  condition  (3.3).  This 
is  because  under  (3.3),  conditions  (3.8.1)  and  (3.8.2)  become 
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LS: 


(4.2.2) 


Here  X and  X are  Lagrange  multipliers  corresponding  to  condition  (3.3). 
~2 

Since  a is  the  same  for  all  j,  (4.2.1)  and  (4.2.2)  yield  identical 
solutions.  The  appendix  demonstrates  that  for  (3.8.1)  and  (3.8.2)  to- 
gether with  (3.4),  (3.6),  and  (3.7)  to  yield  identical  estimates  of  a, 

2 

8,  6,  and  0 , then 


fi(6)  * cq  l T lim  6 + C1  = 0 , (4.3) 

where  cq  and  are  arbitrary  constants.  A corollary  to  this  is  that 
where  (4.3)  does  not  hold,  ML  and  LS  estimates  will  be  different  for 
at  least  some  values  of  y and  X.  Moreover,  this  difference  is  gener- 
ally independent  of  sample  size,  so  that  LS  estimates  are  different 
from  ML  estimates  at  all  sample  sizes  even  in  the  limiting  case  and, 
therefore,  are  inconsistent. 


ETV  Estimates 

A 

Under  ETV  estimates  of  8 are  the  OLS  estimates  from  a regression 
of  y (adjusted  for  variance)  on  X.  In  particular, 


J 

8 = I (X,X)'1x!y  , (4.4.1) 

j=l  fij  J J 


where  X is  the  T * K matrix  of  independent  variables  for  subgroup  j. 

J J A /-V 

Under  any  normalization  rule,  the  ratio  6^/6^  from  (3.6.2)  is 


fi 


£ y2  /(T 

liill 


1 


- 1) 


4/<Th 


(4.5) 


- 1) 
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Taking  probability  limits  as  T 00  and  T -►  00  yields 


6 5 la2  + u \ 2 

plim  * - jr-  ( ^ 

«h  5h  \o2  + ^ 


(4.6) 


where  u>  is  the  lim  1/(T  - 1)(0'X'X  0)  (the  "explained  variance")  and  is 

J J J t it 

assumed  to  exist  and  be  non-zero.  The  consistency  of  the  ratio  6^/6^ 

depends  on  having  the  same  explained  variance  in  each  subset.  Thus, 
if  the  rows  of  X.  can  be  viewed  as  coming  from  a distribution  that  is 
more  homogeneous  than  the  distribution  of  X^,  then  Wj  will  be  smaller 
than  and  plim  (6^/^)  will  be  too  small.  This  result  should  be  quite 
intuitive.  Under  ETV  the  value  of  6^  is  determined  without  any  informa- 
tion on  X . The  technique  attributes  to  6 any  variation  regardless  of 
J J 


source.  If  X^X^ 


x-x2 


. = X’X  this  would  seem  to  be  a perfectly 

%J  J 


appropriate  technique  for  any  sample  size. 

^ /\  A A 

The  exact  degree  of  inconsistency  in  0 (defined  as  plim  0/B)  can 
be  determined  where  there  are  only  two  subsets  of  observations.  We 
assume  further  that 


D lim  ~ X^X2 


(4.7) 


where  p is  the  ratio  of  the  variance  of  the  two  subsets.  The  proba- 
bility limit  0 as  T - T2  + » is 


plim  0 


' 

1 

(1  + p)  - r(l  - p) 

1 

4 + P 

(1  + p)  + r(l  - p) 

*1 

1 + p 

(1  + p)  + r(l  - p) 

1 + P 

(1  + p)  - r (1  - p) 

(4.8) 
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2 1 

The  parameter  r is  the  limiting  value  of  the  R of  the  model.  The 

/\ 

limiting  values  of  plim  3/8  as  p and  r approach  extreme  values  ampli- 
fies on  the  results  of  (4.5).  In  particular 


lim  plim  8/6  = 1 
P+1 


(4.9.1) 


lim  plim  8/6  = 1 
r-H) 


(4.9.2) 


lim  plim  8/8 
p-x» 


(4.9.3) 


1 3 

4 4 

lim  plim  6/8  = • 

r-1  1 + P 


(4.9.4) 


As  the  degree  of  heterogeneity  or  as  R diminishes,  the  degree  of  incon- 

A 

sistency  in  8 also  diminishes.  The  fact  that  ETV  works  best  (In  the 

2 2 

limit)  for  models  with  a small  R is  because  O tends  to  swamp  the 


values  8'X^X^8  arid  8’X^X28  (see  (4.5)) 


This  therefore  reduces  the 
error  from  ignoring  the  explanatory  variables  in  estimating  6. 


ERV  Estimates 


The  most  important  point  to  make  about  ERV  estimates  is  that  they 
approach  ML  estimates  as  all  T^  -*■  °°.  Equality  of  ERV  and  ML  estimates 
at  all  values  of  y and  X require  that  there  be  no  difference  between 
(3.8.1)  and  (3.8.3).  This  requires 


In  particular, 


r = 


12  2 
I <“I  + u2> 
21.2  2, 
0 + 2 (i*^  + u>2 ) 
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This  yields 


’ v)v  + 1% 


(4.11) 


f ^ - " f°r  Sl1  J>  the  SeCOnd  te™  "“1  t«nd  to  zero . The  first  term 
- if  8 IS  the  same  as  the  least  scares  estimate  of  6 based  only 

observations  In  the  jth  subset.1  g.  i„  fact,  is  the  least  squares 
estimate  based  on  all  observations.  However,  a,  1 ..  for  all  . .. 

iiTn^  esti"ated  fr°m  the  jth  subset  appr°aches  §-  *“  *■  «•» 

(4.10)  Is  satisfied,  and  ERV  and  Mb  estimates  are  equivalent. 

A practical  problem  in  applying  ERV  is  that  in  models  with  small 
sample  sizes  and  a high  R . the  probability  of  obtaining  solntions  with 
8 nary  components  is  substantial.  In  cases  where  this  occurred  we 
adopted  the  procedure  (certainly  unsound)  of  setting  the  imaginary 'com- 
ponent equal  to  zero.  This  poses  a serious  practical  drawbacb  to  use 


ASYMPTOTIC  VARIANCE  OF  9 - fl 

that  t!6  llterat“ra  °n  a8ymPt0tk  properties  of  ML  estimates  suggests 
that  the  asymptotic  variance  of  0 - 0 ls 


lim^Var  (6-0)  = ^ T X(0)  , 


That  is. 


iGf-vh 


for  all  j , 


8 = o if  6 


(X'x  ) X' 

1 J J Sj 


(4.12.1) 
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where 


r(0)  = E 


32  log  l\ 
902  / ’ 


(A. 12. 2) 


This  suggests  an  estimator  for  the  variance  of  the  estimates  of  the 
multi-scale  model.  Under  this  estimate  it  can  be  shown  that  variance- 
covariance  of  the  behavioral  parameters  6 will  not  be  equal  to 


0 


2 


T 


(X'X) 


» 


unless 


92  log  L 

9696  ' 


= 0 . 


(A. 13) 


The  appendix  shows  that  (A. 13)  holds  if  and  only  if  X'X.  is  equal  for 

A J J 

every  subset  j.  Thus,  the  "t-statistics"  for  6 from  the  classical 

normal  model  will  hold  asymptotically  for  the  multi-scale  model  if  and 

only  if  the  moment  matrix  X^X^  is  the  same  for  every  subset. 

If  the  independent  variables  do  not  have  the  same  dispersion  mat- 
~2  -1 

rix,  a correction  to  0 / T (X'X)  is  required.  This  will  be  given  by 
calculation  of  T (0).  In  the  specific  case  of  one  variable  and  two 
subsets  we  have  calculated  the  specific  asymptotic  variance  of  6: 


Var  (6  - B) 


-2 

o 


2 2 

2 + 6 <r 

X 


2 + e2o2  O2 

X1  X2 


/c 


where  0„ 


0~,  and  o are  the  "variances"  of  x from  the  two  subsets 


2 , 2 
'xl*  Jx2*  3nd  °x 

and  the  total  sample.  The  value  of  the  term  in  parentheses  attains  a 

2 2 2 

value  of  1.0  where  Ox  = ox  = °x  ’ ^ut  otherwise  is  greater  than  unity. 

1 2 a 

Hence  the  asymptotic  variance  of  0 is  at  least  as  great  as  the  variance 

A 

of  6 in  the  classical  normal  model. 


-26- 


SUMMARY  OF  ESTIMATES 

We  have  presented  some  of  the  mathematical  characteristics  and 
statistical  properties  of  ML,  LS,  ERV  and  ETV  in  this  section  and  in 
the  appendix.  Table  1 briefly  recapitulates  the  most  important  fea- 
tures of  all  five  methods  of  estimating  the  multi-scale  model.  The 
results  of  this  section  suggest,  if  anything,  that  ML  estimates  display 
the  fewest  bad  features  if  one  ignores  computational  cists.  Neverthe- 
less, there  are  many  cases  where  ML  estimates  have  impressive  asymptotic 
properties  but  are  not  the  best  estimates  in  small-sample  situations. 

In  the  absence  of  any  specific  guidance  on  small-sample  properties,  our 
method  has  been  to  rely  on  Monte  Carlo  experiments  to  determine  the 
superior  estimating  techniques.  This  is  the  subject  of  the  following 
section. 


Table  1 

FEATURES  OF  MULTI-SCALE  ESTIMATES 


Feature 


MI.a  LS 


ERV 


ETV  OLS 


Type  of  solution 
Invariance  of  8 

Iterative 

iLerat lve 

Iterative 

Nonlterat lve 

to  (3.2) 

Not  Invariant 

Not  Invariant 

Invariant 

Invariant 

Values  of  4** 

Real 

Positive 

Real 

May  be  negative1" 

May  be  complex 
Positive  if  real 

Real 

Positive 

Consistency 

Consistent 

Inconsistent0 

Consistent 

Inconsistent^ 

aAdJusted  for  degrees  of  freedom. 

bActually,  solutions  to  normal  equations  for  4,  provided  solutions  exist. 
cConslstent  and  positive  If  (3.3)  holds. 

^Consistent  If  pllm  1/Tj  (8'XjXjB)  is  equal  across  subsets. 


Noniterative 

Invariant 

N.A. 


Inconsistent 
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V.  MONTE  CARLO  RESULTS 


Specification  of  an  estimator's  distribution  is  an  important  as- 
pect of  the  development  of  any  econometric  estimating  procedure.  Such 
information  assumes  a special  importance  in  the  present  study  insofar 
as  several  alternative  estimators  have  been  developed  for  estimating 
the  parameters  of  the  multi-scale  model.  Therefore,  knowledge  of  the 
statistical  properties  of  these  alternatives  is  important  not  only  for 
establishing  the  statistical  reliability  of  any  given  estimate,  but 
also  for  selecting  the  appropriate  multi-scale  estimator  under  differ- 
ent sample  conditions.  Indeed,  it  was  stated  at  the  outset  that  no 
single  estimator  is  dominant  over  the  entire  range  of  possibilities. 
Instead,  the  appropriateness  of  any  of  the  multi-scale  estimators  de- 
pends, among  other  tilings,  upon  the  sample  size,  the  signal-to-noise 
ratio,  and  the  degree  to  which  the  model  is  multi-scale. 

Other  than  for  consistency,  the  statistical  properties  of  the 
alternative  estimators  cannot  be  derived  analytically.  We  must  there- 
fore resort  to  numerical  approximations  through  the  setup  of  Monte 
Carlo  experiments  to  obtain  the  distributional  properties  of  the  multi- 
scale estimators.  A description  of  the  experimental  approach  and  the 
results  from  these  experiments  is  given  below.1 

MONTE  CARLO  METHODOLOGY 

Since  we  were  able  to  establish  only  the  consistency  of  the  multi- 
scale estimators  on  an  analytical  basis,  we  have  iiad  to  resort  to  Monte 
Carlo  experimentation  to  determine  other  statistical  properties  of  the 
estimators.  The  strategy  used  in  these  experiments  is,  for  the  most 
part,  dictated  by  the  results  derived  previously.  For  example,  sample 
size  may  be  increased  either  by  increasing  the  number  of  subgroups  or 
by  increasing  the  number  of  observations  per  subgroup,  and  tne  effects 
of  these  two  alternatives  may  be  considerably  different.  Therefore, 


-•’or  a more  complete  description  of  the  Monte  Carlo  methodology 
and  results,  see  Cooper  (17). 
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sample  composition,  as  well  as  sample  size,  must  be  an  explicit  part 
of  the  experimental  design.  Before  dealing  with  these  specifics,  how- 
ever, we  first  outline  the  general  approach  used  in  the  experiments. 

Given  the  basic  multi-scale  model, 

yu  ■ aj + vv + v * 

and  specified  values  for  the  a,  6,  and  8 vectors  and  for  the  X matrix, 
we  conducted  Monte  Carlo  experiments  by  simulating  values  for  the  e 
vector.  The  elements  of  the  e vector  were  drawn  from  a normal  popula- 
tion and  the  number  of  cases  run  for  each  experiment  was  determined 
according  to  the  number  required  to  yield  a stable  representation  of 
the  parameter  distributions.1 

A number  of  experiments  were  conducted  for  different  formulations 
of  the  multi-scale  model,  as  indicated  below. 


Model  Specification 

Two  specifications  of  the  model  were  tested,  one  with  two  explan- 
atory variables  and  one  with  five  explanatory  variables.  Only  the  two 

2 

variable  version  is  reported  here.  The  parameter  values  were:  B = 1.0 

and  = 2.0.  The  two  explanatory  variables  were  uncorrelated. 

Sample  Size 

As  noted  previously,  sample  size  in  the  multi-scale  model  has  two 


That  is,  a concern  in  conducting  Monte  Carlo  experiments  is  how 
many  cases  must  be  run  before  the  estimated  distributions  of  the  param- 
eter estimates  "reasonably  reflect"  the  true  distributions.  Although 
a precise  reflection  would  require  an  infinite  number  of  cases,  such 
an  approach  is,  of  course,  not  feasible.  Instead,  the  procedure  was 
to  run  200  cases  for  one  of  the  experiments,  with  a summary  printed 
every  20  cases.  These  summary  statistics  were  then  examined  to  deter- 
mine where  the  estimated  distributions  began  to  stabilize — that  is, 
where  the  addition  of  another  20  cases  did  not  appreciably  change  the 
estimates  of  the  distributions.  For  medium  to  large  samples,  the  num- 
ber of  cases  required  was  20;  for  very  small  samples,  the  number  of 
cases  required  was  100.  The  number  of  cases  is  reported  with  the  re- 
sults. For  a more  complete  description  of  the  approach,  see  Cooper  [17]. 

2 

The  five  variable  results  are  reported  in  Cooper  [17].  They  yield 
essentially  the  same  results  as  the  two  explanatory  variable  specification. 
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dimensions:  the  number  of  observations  per  subgroup  and  the  number 

of  subgroups.  Accordingly,  experiments  were  conducted  for  a number 
of  different  sample  sizes  and  configurations.  Complete  results  are 
reported  for  one  "larger"  sample  experiment:  five  subgroups  with  50 

observations  per  subgroup — the  so-called  5 x 50  sample.  Detailed  re- 
sults are  also  reported  for  one  "small"  sample  situation — j.ive  sub- 
groups with  five  observations  per  subgroup  (denoted  the  5x5  sample) — 
and  for  one  "medium"  sample  configuration — j0  subgroups  with  five 
observations  per  subgroup  (denoted  the  50  x 5 sample).  Summary  results 
are  also  reported  for  the  following  sample  sizes  (where  the  first  num- 
ber shows  the  number  of  subgroups  and  the  second  shows  the  number  of 
observations  per  subgroup):  10  x 5,  25  x 5,  100  x 5,  and  25  x 10. 

Finally,  experiments  were  conducted  for  one  sample  configuration  where 
the  number  of  observations  per  subgroup  varied:  50  subgroups  with  an 

average  of  five  observations  per  subgroup  (as  few  as  three  and  as  many 
as  10),  denoted  the  50  x 5 (var)  sample. 

Explanatory  Variables 

The  explanatory  variables  were  chosen  such  that  the  correlation 
between  the  two  was  zero.^  Two  sets  of  experiments  were  conducted  with 
regard  to  the  explanatory  variables.  In  the  first,  the  explanatory 
variables  were  drawn  from  homogeneous  populations- -that  is,  the  ex- 
planatory variables  for  each  subgroup  came  from  the  same  population. 

In  the  second,  the  explanatory  variables  were  drawn  from  heterogeneous 
populations  that  is,  the  populations  from  which  the  explanatory  vari- 
ables were  drawn  differed  by  subgroup.  In  half  of  the  subgroups,  the 
standard  deviation  of  the  underlying  population  for  each  of  the  explan- 
atory variables  was  twice  that  for  the  other  half  of  the  subgroups. ^ 
These  two  sets  of  experiments  are  referred  to  as  the  homogeneous  and 
heterogeneous  cases,  respectively. 


Although  the  explanatory  variables  were  drawn  from  two  popula- 
tions with  zero  correlation,  the  sample  correlation  for  the  actual 
variables  used  was  0.08. 

2 

To  clarify  the  procedure,  the  two  explanatory  variables  for  the 
homogeneous  case  were  each  drawn  from  a normal  population  with  mean 
zero  and  standard  deviation  of  10.  Once  the  particular  set  of  homo- 
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Signal-to-Noise  Ratio 

The  signal-to-noise  ratio  is  based  on  the  "true"  model,  as  given 
previously  in  Eq.  (2.2),  rather  than  on  the  observed  model  given  in 
Eq.  (2.3).  Three  different  signal-to-noise  ratios  were  tested,  yield- 
ing true  R2s  of  0.1,  0.5,  and  0.9. 


a and  6 

One  a vector  was  used  for  all  the  experiments;  the  elements  of 
the  a vector  were  generated  from  a uniform  distribution.1  Four  dif- 
ferent 6 vectors  were  used  in  the  experiments.  Each  was  generated  from 
a log-normal  distribution  and  normalized  such  that  the  geometric  mean 
equaled  one.  The  only  difference  in  these  S vectors  is  the  set  of 
parameters  describing  its  corresponding  normal  distribution.  In  each 
vector,  the  mean  of  its  corresponding  normal  was  zero;  for  6^,  the 
standard  deviation  was  0.1;  for  &2,  it  was  0.25;  for  63,  it  was  0.5; 
and  for  6^,  it  was  1.0.  This  yielded  four  6 vectors,  where  the  geo- 
metric mean  of  each  was  one,  but  where  the  variances  were  0.04,  0.33, 
3.57,  and  411.1. 2 

It  will  be  shown  later  that  the  multi-scale  estimators  are  un- 
affected  by  the  6 vector  (so  long  as  the  geometric  mean  equals  one). 


geneous  explanatory  variables  was  drawn,  it  was  used  for  the  remainder 
of  the  experiments  (that  is,  the  Monte  Carlo  experiments  were  not  con- 
ducted for  random"  explanatory  variables).  In  the  heterogeneous  case, 
for  half  of  the  subgroups,  x^  and  X2  were  the  same  as  for  the  homogen- 
eous case  (i.e.,  drawn  from  normal  populations  with  mean  zero  and  a 
standard  deviation  of  10).  For  the  other  half  of  the  subgroups,  xx 
and  X£  were  each  drawn  from  normal  populations  with  mean  zero  and 
standard  deviation  of  20.  Again,  once  the  basic  set  of  heterogeneous 
explanatory  variables  was  drawn,  it  was  used  for  the  remainder  of  the 
heterogeneous  experiments. 

Since  each  of  the  estimating  techniques  estimates  the  6 vector 
by  merely  subtracting  out  the  subgroup  means,  the  distribution  of  the 
elements  of  <5  does  not  affect  the  estimation.  A uniform  distribution 
was  chosen  for  convenience  only. 

2 

Again,  note,  that  one  6 vector  was  used  for  any  given  set  of  ex- 
periments. The  term  "variance"  is  not  meant  to  imply  randomness  nor 
that  the  6 elements  were  redrawn. 

3 

For  convenience,  the  particular  6 vector  used  for  the  multi-scale 
estimators  was  the  one  where  the  variance  equals  3.57. 
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However,  the  variability  of  the  elements  of  6 determines  when  it  is 
worthwhile  to  use  one  of  the  multi-scale  estimators  rather  than  just 
relying  on  ordinary  least  squares. 

Estimators 

As  shown  previously,  least  squares  yields  identical  results  to 
maximum  likelihood  when  the  6 vector  is  normalized  such  that  the  geo- 
metric mean  of  the  elements  equals  one;  otherwise  it  yields  inconsistent 
estimates.  Therefore,  Monte  Carlo  results  are  presented  only  for  three 
multi-scale  estimators:  maximum  likelihood  (MLE) , equal  residual  var- 

iance (ERV),  and  equal  total  variance  (ETV).  As  a basis  for  comparison, 
ordinary  least  squares  with  dummy  variables  for  the  subgroup  intercepts 
(OLS-DV)  was  also  used. 

The  foregoing  constitutes  the  basis  of  the  Monte  Carlo  experiments. 
These  experiments  are  addressed  to  two  principal  questions:  (1)  Which 

is  the  preferred  multi-scale  estimator  under  alternative  sample  and 
model  configuration?  and  (2)  When  is  the  multi-scale  approach  to  be 
preferred  to  least  squares  with  dummy  variables  for  the  intercepts? 

ESTIMATOR  DISTRIBUTIONS 

As  noted  previously,  the  small  sample  distributions  for  the  multi- 
scale estimators  cannot  be  derived  analytically.  Since  the  Monte  Carlo 
approach  just  outlined  suggests  that  we  examine  the  multi-scale  esti- 
mators under  a variety  of  conditions,  it  is  desirable  to  simplify  these 
comparisons  as  much  as  possible.  In  this  regard,  a useful  first  step 
is  to  'tain  the  functional  form  of  the  distributions  so  that  the  com- 
parisons can  be  made  in  terms  of  the  "sufficient  statistics"  for  the 

A 

distributions.  Our  concern  will  be  with  the  distribution  of  8,  since 
a and  6 can,  for  the  most  part,  be  regarded  as  nuisance  parameters. 

A 

Thus,  while  8 is  known  to  be  asymptotically  normal,  we  must  rely  on 
Monte  Carlo  experiments  to  demonstrate  the  small  sample  distributions. 

A 

To  generate  the  distributions  of  8»  1000  cases  were  run  on  the 

2 

10  x 5 sample,  with  homogeneous  explanatory  variables,  and  an  R of 
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O.5.1  The  results  from  these  experiments,  which  are  given  in  Table  2, 
suggest  that  the  MLE  and  ETV  estimators  for  6 are  approximately  norm- 
ally distributed.  The  areas  shown  for  both  parameters  of  both  esti- 
mators closely  approximate  the  theoretical  normal  distribution. 

The  above  finding  has  the  important  implication  that  the  multi- 
scale estimators  can  be  described  fully  by  their  means  and  variances. 
The  problem  of  comparing  the  estimators  is  correspondingly  simplified. 
With  respect  to  choosing  among  the  estimators,  the  criterion  that  will 
be  used  is  that  of  the  minimum  mean  squared  error  (i.e.,  the  sum  of 
variance  and  the  bias  squared) . 


SAMPLE  SIZE 

The  two  dimensions  of  sample  size  in  the  multi-scale  model  raise 
a potentially  important  distinction  for  the  composition  of  the  sample, 
since  the  number  of  parameters  to  be  estimated  equals  2J  + K,  where  J 
is  the  number  of  subgroups  and  K is  the  number  of  explanatory  vari- 
ables.^ Therefore,  the  more  subgroups  there  are,  the  more  parameters 
there  are  to  estimate  such  that,  for  a given  number  of  observations, 
there  are  fewer  degrees  of  freedom  and  the  ratio  of  observations  to 
parameters  der'ines.  In  the  discussion  below,  detailed  results  are 
presented  for  the  5 * 50  sample  and  for  the  50  x 5 sample.  Summary 
statistics  are  then  reported  for  (1)  increasing  sample  size  by  adding 
subgroups  and  (2)  the  effect  of  sample  composition  holding  sample  size 
constant.  These  experiments  are  all  based  on  homogeneous  explanatory 
variables. 

The  5 x 30  Sample 

Tie  Monte  Carlo  results  for  five  subgroups  with  50  observations 


^"Although  20  to  100  cases  are  sufficient  to  yield  reasonably  ac 
curate  estimates  of  the  means  and  variances  of  the  distributions,  1000 
cases  were  required  to  reflect  the  entire  shape  of  the  distributions. 

2A1so  shown  in  Table  2 is  a computer  generated  normal  distribution, 
based  on  1000  cases.  This  shows  how  the  results  from  even  1000  cases 
can  deviate  modestly  from  the  theoretical  distribution. 

^Note  that  since  each  subgroup  has  its  own  intercept  term,  there 
is  no  general  constant  term. 
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Table  2 

DISTRIBUTIONS  OF  THE  MLE  AND  ETV  ESTIMATORS  AND 
THE  NORMAL  VARIABLE3 


Normal  MLE  ETV 


Actual*5 

Computer- 

Generated0 

— 

B2  - B2 

— 

B1  - B1 

— 

B2  -B; 

(1) 

(2) 

(3) 

(4) 

(5) 

(6) 

Mean 

0 

-0.025 

0.983 

1.975 

1.109 

1.455 

Std.  dev. 

1 

1.019 

0.117 

0.165 

0.115 

0.103 

/\ 

B 

Areas 

< -36 

0.0013 

0.0 

0.002 

0.002 

0.005 

0.003 

-36  to  -26 

0.0214 

0.019 

0.023 

0.029 

0.020 

0.023 

-26  to  -16 

0.1360 

0.147 

0.129 

0.123 

0.121 

0.122 

-16  to  0 

0.3413 

0.332 

0.347 

0.357 

0.352 

0.354 

0 to  +16 

0.3413 

0.346 

0.350 

0.335 

0.341 

0.340 

+16  to  +26 

0.136 

0.134 

0.123 

0.127 

0.140 

0.137 

+26  to  +36 

0.0214 

0.021 

0.026 

0.026 

0.021 

0.020 

> +36 

0.013 

0.001 

0.0 

0.001 

0.0 

0.010 

3 

Based  on  1000  cases. 

Actual  probability  distribution  for  a standardized  normal  variable, 
c 

Probability  distribution  for  a standardized  normal  variable  as 
from  the  random  normal  variable  generator  on  the  computer. 


per  subgroup  are  given  in  Table  3.  These  show  the  intuitively  appeal- 
ing result  that  all  three  multi-scale  estimators  have  essentially  the 
same  properties  in  the  large  sample.  This  is  to  be  expected  for  two 
reasons.  First,  all  three  estimators  were  shown  to  be  consistent  when 
the  explanatory  variables  are  drawn  from  homogeneous  populations.  Since 
consistency  is  defined  in  terms  of  increasing  sample  size  holding  the 
number  of  subgroups  constant,  and  since  the  5 * 50  sample  would  be  con- 
sidered "large"  by  most  measures,  we  would  expect  the  means  of  the 
distributions  for  all  three  estimates  to  be  approximately  the  same. 
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Table  3 

MONTE  CARLO  RESULTS  FOR  THE  5 x 50  SAMPLE3 

(Five  subgroups  with  50  observations 
per  subgroup) 


R2 

Estimator*3 

81 

SV1 

MSE1 

82 

sv2 

mse2 

.1 

MLE 

1.099 

0.193 

0.203 

1.957 

0.401 

0.403 

ERV 

1.108 

0.197 

0.209 

1.971 

0.405 

0.406 

ETV 

1.094 

0. 189 

0.198 

1.945 

0.398 

0.401 

.5 

MLE 

0.991 

0.011 

0.011 

1.995 

0.025 

0.025 

ERV 

0.991 

0.011 

0.011 

1.987 

0.025 

0.025 

ETV 

0.989 

0.011 

0.011 

1.987 

0.025 

0.025 

.9 

MLE 

0.989 

0.003 

0.004 

2.010 

0.002 

0.002 

ERV 

0.999 

0.003 

0.004 

2.010 

0.002 

0.002 

ETV 

0.993 

0.004 

0.004 

2.001 

0.002 

0.002 

NOTE:  Explanatory  variables  drawn  from  a homogeneous 

populat ion. 

aThe  basic  model  is  given  as  y^j  = oij  + 6j  (8^  • 

+ 82  * *21  + ui) • The  results  are  based  on  20  cases  and 
homogeneous  explanatory  variables. 

^Maximum  likelihood  (MLE),  equal  residual  variance 
(ERV) , and  equal  total  variance  (ETV). 

c8i  = 1.0;  82  = 2.0.  8i  refers  to  the  mean  for  the 
experiments;  SV^  refers  to  the  variance;  and  MSE^  refers 
to  the  mean  squared  error. 


Second,  consider  the  methods  of  estimation.  The  ETV  estimator 
explicitly  assumes  that  differences  in  the  within-subgroup  variances 
of  the  dependent  variable  are  due  exclusively  to  differences  in  the 
scale  parameters.  Since  this  assumption  is  in  fact  correct  when  the 
xs  are  drawn  from  homogeneous  populations  and  when  the  number  of  ob- 
servations per  subgroup  is  large  enough  to  avoid  small  sample  problems, 
the  ETV  estimator  provides  consistent  estimates.  Moreover,  since  the 
MLE  and  ERV  estimators  are  different  from  ETV  only  so  long  as  differ- 
ences in  within-subgroup  variances  of  the  dependent  variable  are  partly 
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attributable  to  factors  other  than  the  scale  parameter,  and  since  this 
is  not  the  case  for  homogeneous  xs  and  large  samples  per  subgroup,  we 
would  expect  that  the  MLE  and  ERV  would  yield  essentially  the  same 
estimates  as  ETV,  as  shown  in  Table  3. 

The  net  result,  then,  is  that  all  three  estimators  yield  essen- 
tially equivalent  estimates  in  the  large  sample.  Given  that  ETV  is 
considerably  less  expensive  to  run,  one  would  generally  prefer  to  em- 
ploy ETV  in  the  large  sample  situations  when  the  explanatory  variables 
are  homogeneous  across  subgroups. 

The  50  x 5 Sample 

The  results  for  the  sample  of  50  subgroups  of  five  observations 
each,  given  in  Table  4,  offer  several  interesting  contrasts  to  those 
from  the  5 x 50  sample.  First,  the  variance  of  the  ETV  estimator  is 


Table  4 

MONTE  CARLO  RESULTS  FOR  THE  50  * 5 SAMPLE0 

(50  subgroups  with  five  observations 
per  subgroup) 


R2 

Estimator** 

61 

SV1 

0.1 

MLE 

1.094 

0.269 

ERV 

1.250 

0.399 

ETV 

0.961 

0.190 

0.5 

MLE 

0.962 

0.026 

ERV 

1.097 

0.036 

ETV 

0.842 

0.021 

0.9 

MLE 

0.979 

0.006 

ERV 

1.012 

0.009 

ETV 

0.817 

0.004 

a 

See  notes  to 

Table 

3. 

MSE1 

e2 

SV2 

mse2 

0.278 

1.832 

0.438 

0.466 

0.461 

2.123 

0.610 

0.626 

0.192 

1.599 

0.327 

0.488 

0.027 

1.926 

0.052 

0.058 

0.046 

2.186 

0.064 

0.098 

0.046 

1.630 

0.036 

0.173 

0.006 

1.983 

0.003 

0.003 

0.010 

2.040 

0.003 

0.005 

0.038 

1.628 

0.003 

0.141 
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always  equal  to  or  less  than  that  for  either  MLE  or  ERV.  In  fact,  this 
always  holds  regardless  of  sample  size,  explanatory  variables,  or  model 
specification. ^ 

Second,  althouth  ETV  always  has  the  smallest  variance,  it  it  not 

always  the  preferred  estimator  on  the  basis  of  the  mean  squared  error 

2 

criterion.  For  low  R s,  ETV  does  have  a smaller  mean  squared  error 

than  either  MLE  or  ERV.  However,  MLE  has  a much  smaller  mean  square 

2 

than  the  ETV  for  high  R s,  because  although  the  explanatory  variables 
were  drawn  from  homogeneous  populations,  the  variables  themselves  will 
almost  necessarily  be  heterogeneous  when  there  are  as  few  as  five  ob- 
servations in  a subgroup.  Therefore,  ETV  will  be  biased  in  the  small 
sample,  where  small  sample  refers  to  the  number  of  observations  per 
subgroup,  even  though  it  is  consistent,  because  ETV  attributes  all 
differences  in  within-subgroup  variances  in  the  dependent  variable  to 
the  scale  parameter  when,  in  fact,  some  of  it  is  due  to  differences 
in  the  variation  of  the  explanatory  variables.  MLE  and  ERV,  however, 
explicitly  take  such  differences  in  within-subgroup  explanatory  vari- 
able variations  into  account,  thus  leading  not  only  to  consistent 
estimates,  but  to  estimates  that  are  also  unbiased. 

Although  ETV  has  the  smallest  variance,  it  is  sufficiently  biased 

when  there  are  few  observations  per  subgroup  that  its  mean  squared 

2 

error  is  larger  than  MLE  and  ERV  for  high  R s.  Moreover,  ETV  does 

2 

relatively  worse  as  the  true  R of  the  model  increases  for  two  reasons: 

2 2 

(1)  the  bias  in  ETV  increases  as  R increases  and  (2)  for  higher  R s, 

bias  plays  a relatively  more  important  role  in  the  mean  squared  error 

2 

criterion  (since  the  variance  decreases  as  R increases). 

Finally,  MLE  is  generally  preferred  to  ERV  since  the  MLE  variance 
tends  to  be  much  less  than  that  for  ERV.  Both  will  be  unbiased,  though. 

The  5 x 50  and  50  x 5 samples  yield  three  ir  irtant  conclusions. 
First,  the  three  multi-scale  estimators  all  yie  . approximately  the  same 
results  when  there  are  many  observations  per  subgroup.  Second,  when 
there  are  few  observations  per  subgroup,  ETV  has  the  smallest  variance 

Hfe  conjecture  that  this  result  occurs  because  ETV  uses  a less 
complicated  procedure  for  estimating  the  5 vector. 


I 
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of  the  estimators.  Third,  though,  ETV  is  biased  when  there  aie  few 

observations  per  subgroup.  Although  this  bias  is  not  sufficient  to 

2 

counteract  the  savings  in  variance  for  low  R s,  it  more  than  offsets 

2 

this  savings  for  high  R s — the  result  being  that  ETV  is  preferred  for 

2 2 
low  R s and  MLE  for  high  R s. 

Sample  Size 

The  effects  of  sample  size  on  the  parameter  distributions  are  a 
necessarily  important  question.  Of  particular  importance  is  the  effect 
on  the  parameter  estimates  when  sample  size  is  increased  through  the 
addition  of  new  subgroups,  since  this  will  often  be  the  only  way  of 
increasing  the  sample  in  situations  where  the  multi-scale  model  is 
applicable.  In  the  standard  linear  regression  model,  the  effect  of 
sample  size  can  be  solved  analytically:  (1)  the  least  squares  esti- 

mator is  unbiased,  regardless  of  sample  size,  and  (2)  the  variance  of 
the  least  squares  estimator  is  proportional  to  sarple  size.  In  the 
multi-scale  model,  however,  the  result  is  less  clear,  for  every  time 
a new  subgroup  is  added,  two  more  parameters  are  also  added. 

Summary  results,  showing  the  marginal  distributions  of  B”  for  dif- 
ferent sample  sizes,  holding  the  number  of  observations  per  subgroup 
constant,  are  reported  in  Table  5.^  These  show  the  perhaps  surprising 
result  that  increasing  the  sample  through  the  addition  of  subgroups 
reduces  the  variance  of  B almost  in  proportion  to  the  number  of  ob- 
servations, as  with  the  standard  linear  regression  model,  even  though 

2 

each  additional  subgroup  adds  two  parameters.  That  is,  although  the 
addition  of  more  parameters  somewhat  reduces  the  benefit  of  the  addi- 
tional observations,  this  reduction  is  modest.  This  is  an  important 


For  simplicity  in  presentation,  Table  5 reports  the  sum  of  mean 
squared  errors  for  p^  and  B*2»  rather  than  the  separate  mean  squared 
errors.  Noce  that  we  can  perform  this  simple  addition  since  both  x. 
and  X£  are  uncorrelated.  1 

2 

This  holds  for  MLE  and  ERV  since  both  are  unbiased.  It  is  also 
approximately  true  for  ETV  at  R2  = 0.1,  since  the  bias  for  ETV  is  small. 
For  higher  R2s,  however,  this  does  not  hold  for  ETV,  since  the  bias 
in  the  ETV  estimator  at  higher  R2s  does  not  fall  as  more  subgroups  are 
added . 
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Table  5 

SUMMARY  MONTE  CARLO  RESULTS  FOR  HOMOGENEOUS 
EXPLANATORY  VARIABLES:  MEAN  SQUARED 

ERROR  AS  A FUNCTION  OF  SAMPLE  SIZE 


MSE1  + MSE2  for  Sample  Sizeb 


R2 

Estimator”1 

5 x 5 

10  x 5 

25  x 5 

50 

x 5 

100  x 5 

0.1 

MLE 

4.670 

3.485 

1.268 

0 

. 744 

0.336 

ERV 

9.493 

5.220 

1.718 

1 

.087 

0.621 

ETV 

3.411 

2.860 

1.093 

0, 

,680 

0.331 

0.5 

MLE 

0.5  35 

0.406 

0. 179 

0. 

,085 

0.040 

ERV 

1 . 008 

0.563 

0.  326 

0. 

144 

0.108 

ETV 

0.501 

0„  558 

0.271 

0. 

219 

0.196 

0.9 

MLE 

0.093 

0.041 

0.013 

0. 

009 

0.007 

ERV 

0.  132 

0.044 

0.020 

0. 

015 

0.007 

ETV 

0.  360 

0.283 

0.178 

0. 

1 79 

0.201 

‘See  note  b.  Table  3. 

Mean  squared  error  for  Bj  plus  mean  squared  error 
for  B2.  Results  for  5 * 5 and  10  * 5 samples  are  based 
on  100  eases,  results  for  23  * 3,  50  * 5,  and  100  * 5 
samples  based  on  20  cases. 


practical  result  since,  as  noted  above,  t lie  onlv  means  of  adding  more 
observations  in  situations  where  the  multi-scale  model  is  the  appro- 
priate specification  may  be  through  the  addition  of  more  subgroups. 

Sample  Composition 

Finally,  consider  the  efieet  of  sample  composition,  holding  the 
number  of  observations  constant.  It  is  clear  from  the  results  shown 
in  Table  6 that  the  more  subgroups  there  are  (and,  hence,  the  more 
parameters),  the  less  precise  are  tin*  estimates  of  6.  Yet,  witli  the 
exception  of  LTV,  whii a becomes  severely  biased  as  subgroup  size  is 
reduced,  the  effects  of  sample  composition  are  not  as  large  as  one 
might  expect.  Note  further  that  most  of  the  gain  from  increasing  sub- 
group size,  again  witli  the  exception  of  ETV , occurs  when  the  subgroup 
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Table  6 

SUMMARY  MONTE  CARLO  RESULTS  FOR  HOMOGENEOUS 
EXT LANATORY  VARIABLES:  MEAN  SQUARED  ERROR 

4S  A FUNCTION  OF  SAMPLE  COMPOSITION 


MSEX 

+ MSE2 

for  Sample 

Sizeb 

Estimator3  5 * 50 

25  x 10 

50  x 5 50 

x 5(var) 

MLE 

0.606 

0.507 

0.744 

n.a. 

ERV 

0.615 

0.561 

1.087 

n.a. 

ETV 

0.599 

0.485 

0.680 

n.a . 

MLE 

0.036 

0.039 

0.085 

0.089 

ERV 

0.036 

0.049 

0.144 

0. 109 

ETV 

0.036 

0.074 

0.219 

0.250 

MLE 

0.006 

0.006 

0.009 

0.017 

ERV 

0.005 

0.006 

0.015 

0.010 

ETV 

0.006 

0.047 

0.179 

0.220 

See  note 

b,  Table  3. 

See  note 

b,  Table  5. 

size  is  increased  from  five  to  ten  observations.  The  estimates  are 
quite  unaffected  if  the  subgroup  size  is  variable.  In  fact,  ERV  ac- 
tually does  better  in  the  variable  subgroup  size  sample  than  in  the 
constant  subgroup  size  sample,  thus  suggesting  that  it  benefits  more 
from  the  introduction  of  a few  large  subgroups  than  it  is  hurt  by  the 
presence  of  very  small  subgroups. 


HETEROGENEOUS  EXPLANATORY  VARIABLES 

When  the  explanatory  variables  are  not  homogeneous  across  all 
subgroups,  ETV  yields  inconsistent  estimates  of  the  parameters.  The 
reason  is  obvious.  ETV  attributes  all  differences  in  with in-subgroup 
variances  of  the  dependent  variable  to  the  scale  parameter.  However, 
when  the  explanatory  variables  themselves  are  heterogeneous  across 
subgroups,  this  is  not  appropriate.  MLE  and  ERV  will  still  be  con- 
sistent when  the  explanatory  variables  are  heterogeneous. 
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Monte  Carlo  results  for  the  5 x 50  sample  with  heterogeneous  ex- 
planatory variables  are  given  in  Table  7.  They  show  the  expected  find- 
ing that  ETV  is  biased  and  inconsistent  when  the  explanatory  variables 
are  heterogeneous,  but  that  MLE  and  ERV  are  both  unbiased.  As  a result 
both  MLE  and  ERV  are  generally  to  be  preferred  to  ETV.  For  very  low 
R s,  ETV  is  to  be  preferred  in  spite  of  its  inconsistency,  since  in  the 
5 x 50  sample  the  smaller  variance  for  ETV  more  than  offsets  its  bias. 


Table  7 


MONTE  CARLO 
EXPLANATORY 

RESULTS  FOR  HETEROGENEOUS 
VARIABLES:  5 x 50  SAMPLE 

61 

b2 

R2 

Technique 

^1 

SV1 

MSE1 

*2 

sv2 

mse2 

.1 

MLE 

1.175 

0.120 

0.150 

1.944 

0.423 

0.427 

ERV 

1.183 

0.121 

0.154 

1.956 

0.434 

0.436 

ETV 

i.091 

0.109 

0.11/ 

1.833 

0.350 

0.378 

.5 

MLE 

0.981 

0.009 

0.010 

2.005 

0.035 

0.035 

ERV 

0.983 

0.010 

0.010 

2.011 

0.034 

0.034 

ETV 

0.  775 

0.005 

0.056 

1.651 

0.018 

0.140 

.9 

MLE 

0.992 

0.004 

0.004 

2 . 004 

0.003 

0.003 

ERV 

1.007 

0.006 

0.006 

2.028 

0.006 

0.007 

ETV 

0.678 

0.002 

0.105 

1.470 

0.001 

0.282 

The  50  x 5 sample  shown  in  Table  8 yields  largely  the  same  results 
ETV  is  very  biased  for  medium  to  high  R2S,  so  that  MLE  and  ERV  are 
again  generally  preferred  to  ETV.  As  before,  MLE  yields  better  esti- 
mates than  ERV.  In  general,  these  results  illustrate  the  importance 
of  heterogeneity  in  the  explanatory  variables. 

THE  SCALE  PARAMETER 

The  scale  parameter  6 is  clearly  what  distinguishes  the  multi- 
scale model  from  the  classical  linear  regression  model.  Therefore,  6 
determines  when  it  is  appropriate  to  use  one  of  the  multi-scale 
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Table  8 

MONTE  CARLO  RESULTS  FOR  HETEROGENEOUS 
EXPLANATORY  VARIABLES : 50  x 5 SAMPLE 


R2 

Technique 

*1 

SV1 

MSE1 

e2 

sv2 

mse2 

.1 

MLE 

1.138 

0.182 

0.201 

1.810 

0. 188 

0.199 

EP.V 

1.311 

0.285 

0.382 

2.155 

0.220 

0.244 

ETV 

0.924 

0.122 

0.128 

1.543 

0.117 

0.326 

.5 

MLE 

0.956 

0.016 

0.018 

1.948 

0.047 

0.049 

ERV 

1.028 

0.022 

0.023 

2.123 

0.052 

0.067 

ETV 

0.686 

0.009 

0.107 

1.286 

0.G17 

0.527 

.9 

MLE 

0.985 

0.004 

0.005 

1.984 

0.003 

0.003 

ERV 

1.103 

0.005 

0.006 

2.073 

0.006 

0.012 

ETV 

C.  604 

0.002 

0.158 

1.092 

0.001 

0.825 

estimators  or  ordinary  least  squares  (with  dummy  variables  for  the 
intercepts).  In  general,  one  would  expect  that,  as  the  variance  of 
the  scale  parameter  increases,  the  desire >11 ity  of  using  one  or  more 
of  the  multi-scale  estimators  (over  ordinary  least  squares)  also  in- 
creases. Conversely,  as  the  variance  of  the  scale  parameter  is  smaller, 
one  would  expect  the  ordinary  least  squares  estimator  to  do  better. 
(Indeed,  in  the  limit  when  all  of  thi  <Ss  equal  dentically,  we 

know  from  the  Gauss-Markov  theorem  » hat  ordinal  ast  squares  is  the 
"best"  estimator.) 

So  iong  as  the  geometric  mean  oi  the  scale  parameter  equals  one, 
the  multi-scale  estimators  are  unaffected  by  the  variability  of  the 
scale  parameter.  Therefore,  the  multi-scale  estimators  can  be  ex- 
amined independently  of  the  scale  parameter,  for  a given  normalization 
rule.  In  contrast,  OLS-DV  clearly  depends  on  the  variability  of  the 
scale  parameter.  To  determine  the  sensitivity  of  OLS-DV  to  the  scale 
parameter  specification,  four  6 vectors  were  generated,  as  noted 
earlier.  Each  of  these  can  be  described  in  terms  of  the  variability 
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of  the  elements.  Finally,  a fifth  6 vector  with  all  the  elements  set 
identical  to  one  is  used  as  a base  case. 

The  Monte  Carlo  results  from  experiments  conducted  on  these  dif- 
ferent 6 vectors  are  shown  in  Table  9.  They  show  the  expected  result 
that  as  the  elements  of  6 become  more  widely  dispersed — i.e.,  as  the 
variance  of  6,  increases — OLS-DV  yields  poorer  estimates.  These 

results  also  suggest  that  there  is  a tradeoff — or  efficiency  frontier — 
that  defines  the  appropriate  estimator  to  use.  For  example,  OLS-DV 
is  clearly  the  appropriate  estimator  when  all  the  6s  are  set  identi- 
cally to  one.  Tbit  is,  there  is  no  need  to  employ  the  multi-scale 
model  when  the  true  model  Is  not  in  fact  multi-scale. 

As  t lie  6s  diverge  from  unity,  it  begins  to  pay  to  use  one  of  the 

2 

multi-scale  techniques.  In  particular,  for  low  R s one  will  want  to 

use  ETV  when  the  variance  increases  to  somewhere  between  0.0  and  0.04. 

2 

for  high  R s (in  the  50  x 5 sample),  one  will  want  to  use  MLE  when  tire 
variance  of  6 g l much  laiger  than  zero. 


SUfIMAJiY 

The  Monte  Carlo  results  allow  us  to  assess  the  performance  of  sev- 
eral competing  estimators  when  the  true  model  is  multi-scale  in  nature. 
They  show  that  no  single  estimator  is  dominant  over  the  entire  range  of 
possible  sample  sizes  and  model  specifications  Instead,  tin-  appropriate- 
ness of  any  single  estimator  depends  on  a number  of  factors:  (1)  sample 

size,  (2)  sample  composition,  (3)  signa 1 -to-noise  ratio,  (4)  the  degree 
of  heterogeneity  in  the  explanatory  variables,  and  (5)  the  degree  to 
which  the  multi-scale  model  is  multi-scale.  These  can  be  combined  to 


That  is,  for  any  given  6 vector, 
is  given  simply  by 


say  fij. 


the  variance  of  the  6s 


V6  = 


j = l 


(6ij  - v2 


The  6 vectors  used  in  Table  9 were  each  generated  from  log  normal  dis- 
tributions. The  reason  that  in  the  5 x 50  sample  differs  from 
in  the  50  * 5 sample  is  simply  that  five  6s  were  drawn  from  each  vec- 
tor in  the  first  case,  while  50  were  drawn  in  the  second. 
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Table  9 

MONTE  CARLO  RESULTS:  COMPARISONS  OF  MEAN  SQUARED 

ERRORS  FOR  OLS-DV  AND  MULTI-SCALE  ESTIMATORS 

(Homogeneous  explanatory  variables) 


MSF.  * MSE  for 


Mult  1-Scale 


Sample 

(< 

3LS-0V 

Ml.E 

ERV 

ETV 

5 » 50 

v6  - 0.0 

V<5  ' °-°- 

' 0.24 

c 

V6  ! 

■ 1.68 

V5  . 35.3 

0.  1 

0.535 

0.645 

0.860 

2 

. 1 00 

49.01 

0.606 

0.615 

0.599 

0.  5 

0.032 

0.037 

0.095 

1 

. 160 

41.51 

0.036 

0.036 

0.1  36 

0.9 

0.005 

0.008 

0.061 

1 

.070 

19.  54 

0.006 

0.005 

0.006 

50  - 5 

v6  - 0.0 

\'5  = 0.04 

V6  - 0.33 

V6  - 3.57 

V6  " 411 

0.  1 

0.674 

0.717 

1.028 

5. 

.587 

552.8 

0.  744 

1.087 

0.680 

0.5 

0 . 06 1 

0.068 

0.  128 

1. 

709 

198.7 

0.085 

0.144 

0.219 

0.9 

0.00  7 

0.011 

0.050 

1 . 

561 

195.8 

0.009 

0.015 

0.179 

yield  legions  when  particular  estimators  are  to  be  preferred,  such  as 
that  shown  earlier  in  Fig.  2 for  the  50  x 5 sample. 

The  Monte  Carlo  results  do  enable  us  to  make  the  following  general 
statements. 


• Ordinary  least  squares  with  dummy  variables  is,  of  course, 
appropriate  when  the  model  is  not  multi-scale.  It  is  also 
preferred  when  the  degree  to  which  the  model  is  multi-scale 
is  very  small  (even  though  OLS  is  inconsistent). 

• Maximum  likelihood  always  does  reasonably  well.  It  is  al- 
ways  consistent,  and  for  medium  to  large  R s it  is  the 

best  estimator  when  there  are  few  observations  per  sub- 
group. 

• Equal  residual  variance,  though  always  consistent,  is 
never  the  preferred  estimator.  Though  it  generally  does 
reasonably  well,  it  sometimes  yields  estimates  with  larger 
variance,  particularly  with  low  R . 
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• Equal  total  variance  sometimes  does  very  well  and  some- 
times very  poorly.  It  always  yields  estimates  with  the 
least  variance  and  is  generally  preferred  for  low  R^s. 

When  the  explanatory  variables  are  drawn  from  hetero- 
geneous populations,  it  yields  inconsistent  estimates — 

an  inconsistency  that  becomes  severe  at  medium  to  high 

2 

R s.  Even  when  the  explanatory  variables  are  drawn  from 
homogeneous  populations,  the  heterogeneity  that  occurs 
when  subgroup  sizes  are  small  makes  the  bias  more  than 
large  enough  to  offset  any  savings  in  variance. 

These  statements  provide  some  practical  guidelines  for  the  appli- 
cation of  the  multi-scale  estimators.  To  begin  with,  the  within- 
subgroup  standard  deviation  of  the  dependent  variable  should  be  calcu- 
lated to  determine  whether  the  model  appears  to  be  multi-scale.  If 
the  within-subgroup  standard  deviations  are  normalized  such  that  their 
geometric  mean  equals  one,  then  the  variance  of  these  standard  devia- 
tions can  be  determined.  If  this  variance  exceeds  about  1.0  for  very 
small  samples  or  about  0.05  for  large  samples,  then  it  probably  pays 
to  use  the  multi-scale  model. 

o 

Second,  the  ETV  estimates  should  be  calculated.  If  the  R is 

2 

small,  then  the  ETV  is  probably  the  best  estimator.  If  the  R is  mod- 
erate to  large,  then  the  explanatory  variables  should  be  examined  to 
determine  whether  they  are  homogeneous.  If  not,  then  the  maximum 
likelihood  estimates  should  be  computed  and  used. 
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VI.  AN  APPLICATION  OF  THE  MULTI-SCALE  MODEL 


The  multi-scale  model  can  be  put  into  some  perspective  by  apply- 
ing it  to  an  actual  estimation  problem.  The  example  used  here—Gay's 
problem  of  estimating  the  cost  of  on-the-job  training  for  airmen— 
actually  served  as  the  genesis  of  the  multi-scale  model. 

In  trying  to  estimate  the  cost  of  on-the-job  training  for  first- 
term  aircraft  maintenance  specialists  in  the  Air  Force,  Gay  had  to 
rely  on  supervisory  estimates  of  the  individual's  productivity  during 
his  first  term  of  duty.  As  shown  in  Table  10,  in  addition  to  the 
familiar  problem  of  supervisory  bias  (i.e.,  the  location  effect,  a), 
the  variances  of  the  dependent  variable  differed  considerably  accord- 
ing to  subgroup,  thus  pointing  toward  the  applicability  of  the  multi- 
scale  model. 


Table  10 

SUBGROUP  MEANS  AND  STANDARD  DEVIATIONS  FOR  GAY'S  STUDY3 


ibgroup 

Number 

of 

Observations 

Mean 

Standard 

Deviation 

ETV 

6 

Ml.E 

ERV 

1 

4 

$ 3,461 

$ 254 

0.27 

0.27 

0.28 

2 

3 

-4” 

ac 

1,987 

2.08 

1.62 

1.37 

1 

3 

11,017 

4,297 

4.49 

3.69 

3.48 

4 

3 

6,359 

3,223 

3.37 

3.25 

2.87 

5 

7 

3,176 

214 

0.22 

0.25 

0.25 

6 

5 

5,314 

409 

0.37 

0.35 

0.33 

7 

3 

5,596 

985 

1.03 

0.96 

0.86 

8 

3 

2,514 

624 

0.65 

0.55 

0.68 

9 

4 

3,634 

1,092 

1.14 

0.80 

0.74 

10 

6 

7,690 

1,736 

1.82 

1.87 

1.80 

11 

8 

12,225 

2,593 

2.71 

3.05 

3.45 

12 

11 

8,084 

732 

0.77 

0.88 

0.98 

Vi 

1.58 

1.46 

1.42 

V« 

1.69 

1.41 

1.33 

SOURCE:  Robert  M.  Gay,  EBtimatitvj  the  Coat  of  On- the-, lob 

Training  in  Military  Occupations:  A Methodology  and  Pilot 

Study,  The  Rand  Corporation,  R-1351-ARPA,  April  1974,  p.  28. 
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As  can  be  seen  in  Table  10,  Gay's  sample  consisted  of  64  observa- 

A 

tions  spread  across  12  subgroups.  The  variance  of  6 implied  by  the 
ETV  estimator  (i.e.,  the  simplest  transformation  of  the  data)  is  1.69, 
which  suggests  that  OLS-DV  will  not  yield  the  best  results.  There- 
fore, the  multi-scale  model  is  probably  appropriate. 

Two  primary  specifications  were  considered; 


C0STi  = 6 


EX?i  + 


eda  + 


APT  + 


Si  + B5 


Wi  + Ci 


(6.1) 


and 


C0STi  = • EXPt  + B2 


ED  -i-  B,  • APT  + B,  • S.  + Bc  * W. 
i 3 i 4 i 5 l 


+ B,  • TECH  + e . 
b 11 


(b.2) 


where  COST^ 
EXPj 
ED. 
APT. 


W, 


TECH . 


cost  of  on-the-job  training  for  the  ith  individual, 
years  of  possible  civilian  job  experience, 
years  of  education, 

percentile  score  on  mechanical  aptitude  test, 
dummy  variable  for  whether  the  individual  is  from 
the  north  (equals  1 if  from  the  south  and  zero 
otherwise) , 

dummy  variable  for  whether  the  individual  is  white 
(equals  1 if  white  and  zero  otherwise), 
percentile  score  on  the  performance  test  in  tech- 
nical training  and, 
error  term. 


The  results  are  given  in  Table  11.  From  the  results  in  Table  10, 

2 

we  know  that  the  OLS-DV  is  not  appropriate.  Given  the  R shown  for 

Eq.  (6.1)  in  Table  11,  ETV  yields  probably  the  best  estimates.  How- 
2 

ever,  the  R for  version  (6.2)  is  probably  sufficient  to  warrant  use 
of  MLE.  Moreover,  there  is  also  considerable  heterogeneity  in  the 
explanatory  variables  among  subgroups.  This  will  tend  to  make  ML 
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Table  11 


ESTIMATES 

FOR  GAY 

'S  0JT 

MODEL: 

EQS.  (6. 

1)  AND 

(6.2) 

Equat ion 

EXP 

ED 

APT 

s 

U 

TECH 

R2 

(6.1) 

OLS-DV 

-95.79 

(199.6) 

-692.7 

(270.6) 

-35.6 

(17.75) 

287.7 
(504. 3) 

294.4 

(534.8) 

0.17 

MLE 

-28.29 

(104.2) 

-406.4 

(141.3) 

24.90 

(9.26) 

-65.78 

(263.2) 

483.2 

(279.2) 

0.26 

ERV 

-6.45 

(104.4) 

-424.2 

(141.6) 

-25.  16 
(9.28) 

-110.0 

(253.8) 

464.4 

(279.fi) 

0.27 

ETV 

-38.52 

(105.2) 

-364.6 

(142.7) 

-22.08 

(9.36) 

-15.41 

(265.9) 

467.1 

(282.0) 

0.21 

(6.2) 

OLS-DV 

-129.  3 
(197.6) 

-662.6 

(267.1) 

-19.81 

(19.85) 

205.5 

(499.0) 

550.0 

(548.4) 

-89.63 

(53.55) 

0.21 

MLE 

-34.55 

(99.57) 

-379.5 

(134.6) 

-10.15 

(10.00) 

-127.8 

(251.5) 

724.3 

(276.4) 

-80. 79 
(26.99) 

0.36 

ERV 

52.01 

(104.5) 

-358.4 

(141.3) 

-5.82 

(10.50) 

-194.9 

(263.9) 

804.5 

(290.0) 

-112.7 
(28. 32) 

0.42 

ETV 

-62.80 

(101.9) 

-342.8 

(137.7) 

-10.67 

(10.23) 

-74.95 

(257.3) 

652.4 

(282.7) 

-64.95 

(27.61) 

0.28 

estimates  even  more  attractive.  Finally,  It  Is  Interesting  to  note 
that  all  three  multi-scale  estimators  yield  similar  coefficient  esti- 
mates, estimates  that  differ  considerably  from  the  OLS  results. 

The  estimated  standard  errors  appearing  in  Table  11  are  derived 
*2  -1 

from  the  esimate  a (X'X)  rather  than  from  the  more  general  informa- 
tion matrix  that  takes  into  account  all  parameter  values.  Thus,  these 
estimated  standard  deviations  are  conditional  on  estimates  of  5.  The 
only  difference  between  these  standard  errors  and  the  OLS  errors  ia 
that  a is  calculated  on  the  basis  of  T - 2J  - K degrees  of  freedom 
rather  than  T - K - 1 degrees  of  freedom  as  in  OLS.  It  is  possible  to 
calculate  the  more  general  variance-covariance  matrix  of  the  coef- 
ficients by  calculating  and  storing  the  matrix  values  XjX^  for  each 
subgroup  j during  the  processing  of  the  data.  However,  the  estimates 

^ a 

of  Var  (8  - 8)  are  not  a byproduct  of  obtaining  8 as  in  the  case  of  OLS. 
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VII.  CONCLUDING  REMARKS 


The  multi-scale  model  offers  a potentially  valuable  tool  for 
analyzing  manpower  problems  where  the  variables  of  concern  are  obtained 
by  subjective  measurement,  such  as  supervisory  ratings  of  individual 
performance.  Specifically,  it  is  suggested  that  measures  obtained  by 
subjective  evaluations,  such  as  supervisory  ratings,  may  include  two 
types  of  biases:  the  location  bias  and  the  scale  bias.  Although  the 

location  bias  can  be  handled  in  multiple  regression  models  through 
familiar  dummy  variable  techniques,  the  scale  bias  poses  special  esti- 
mation problems  and  necessitates  the  development  of  a multi-scale 
estimator. 

Of  the  several  multi-scale  estimating  procedures  developed,  three 
are  found  to  be  appropriate  for  real-world  applications:  (1)  OLS  (with 

dummy  variables),  (2)  ETV,  and  (3)  MLE.  Although  maximum  likelihood  is 
the  only  one  of  these  three  procedures  always  to  yield  consistent  esti- 
mates, OLS  and  ETV  may  be  more  appropriate  from  an  efficiency  stand- 
point when  consistency  is  less  of  a concern — e.g.,  in  small  sample 
situations.  In  particular,  OLS  is  the  appropriate  technique  in  small 
samples  when  the  true  model  is  only  "modestly"  multi-scale;  ETV  is 
appropriate  when  there  is  a scale  problem  but  the  R2  is  small;  and  MLE 
is  appropriate  when  there  is  a scale  problem  and  when  the  R2  is  moder- 
ate to  large. 

The  resulting  estimates  are  useful  in  two  respects.  First,  the 
multi-scale  approach  allows  the  analyst  to  estimate  the  parameters  in 
multiple  regression  models  when  the  dependent  variable  is  subject  to 
the  scale  transformation.  These  estimates  can  then  be  used  to  con- 
struct corrected  estimates  of  the  dependent  variable. 

Some  of  the  limitations  of  the  multi-scale  approach  and  some  pos- 
sible directions  for  future  research  are  that  the  multi-scale  model, 
as  we  have  structured  it,  is  not  necessarily  appropriate  for  all  prob- 
lems in  which  the  dependent  variable  is  obtained  through  subjective 


As  noted  previously,  the  multi-scale  model  may  be  appropriate  in 
other  cases  where  the  data  fall  into  natural  groupings. 
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evaluation.  I i particular,  our  formulation  requires  cardinal  measures 
of  the  dependent  variable,  not  the  ordinal  measures  one  often  finds  on 
supervisory  evaluation  forms.  When  cardinal  measures  are  available, 
though,  the  multi-scale  approach  is  probably  worth  investigating,  as 
implied  by  our  analysis  of  Gay's  model.  Second,  like  other  techniques, 
the  multi-scale  approach  is  not  valid  for  cases  in  which  the  measure- 
ment bias  is  selective.  Indeed,  the  essence  of  the  multi-scale  approach 
rests  in  the  notion  that  the  measurement  bias  is  consistent  and  syste- 
matic within  subgroups. 

Finally,  the  problem,  as  it  has  been  structured  here,  allows  for 
only  one  measured  observation  per  "true"  observation — e.g.,  one  super- 
visory rating  per  individual.  Sometimes,  though,  there  may  be  several 
subjective  evaluations  (i.e.,  measured  observations)  for  each  true  ob- 
servation, such  as  several  supervisors  rating  one  individual.1  It 
would  therefore  be  desirable  to  extend  the  basic  multi-scale  framework 
to  allow  the  multiple  observation  case. 


This  is  the  case  in  Gay's  current  work. 


-51- 


Appendix 

MAXIMUM  LIKELIHOOD  AND  LEAST  SQUARES  ESTIMATES 


This  appendix  derives  estimates  of  the  multi-scale  model,  assum- 
ing that  the  vector  of  location  parameters  a is  zero.  Thus, 

y t j * ^ j (^i j B + j )»  i = i»  l Tj 

j * 1,  1,  •••»  J (A. 1) 

X^^  may  be  a vector  everywhere  equal  to  unity.  The  side  condition 
is  a strictly  separable  function 

G(6)  = 0 (A. 2) 


with  a particular  form 


l Tj  In  6j  - 0 . (A. 3) 

We  write  the  parameter  vector  0 as 

0 - (8, <5, a2)  . 


MAXIMUM  LIKELIHOOD  ESTIMATES 

ML  estimation  utilizes  the  property  that  the  e.  are  normal  inde- 

2 

pendent  d(  'iates  with  Ee  * 0 and  Eee'  * 0 I.  The  likelihood  function 
can  be  written 


J ^ / 2 2 \ 

n n 2tto  5 

j=i  i=i  \ J / 


j. 

2 


exp 


2};2  (y‘J  " Vu8) 


(A. 4) 


(2tto2)  ii  ^expf-^n^-X.a) 

j-1  J 2a  j i \°j  1J  / 

Preceding  page  blank 


(A. 5) 
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ML  estimates  are  found  through  maximization  of  the  logarithm  of  the 
likelihood  function,  adding  a term  with  a Lagrangian  multiplier  to 
account  for  the  side  condition  (A. 2): 


log  L 


| In  2„o2  - } TJ  1"  6j  ' ^2  H (^  - xlj6  ) + • 


FIRST-ORDER  CONDITIONS 

A 

The  ML  estimates  0 are  solutions  to  the  equations 


(A. 6) 


|q  log  L = 0 . 


(A.  7) 


Partial  differentiation  yields  four  sets  of  equations. 


3_ 

36 


108  wji)  -° 


(A. 8.1) 


0 (A. 8. 2) 


36~  I08  L = - T 


j 6 


i + i y/iil  _ 


2 \\6. 
a j ' J 


v)  jF + » I 


0 , 


J 


j 


j * 1*  •••»  J (A. 8. 3) 


3 . 

9X  108 


L = G(5)  = 0 


(A. 8.4) 


The  first  two  conditions  yield  ML  estimates  to  the  classical  normal 
model: 


6 = (X'X)_1  X’z 


VA.9) 
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and 


[;u  - V ) 


(A. 10) 


where 


•ij 


For  condition  (A. 8. 3),  where  the  logarithmic  form  (A. 3)  of  the 
side  conditions  is  used,  the  equation  may  be  written 

J [ ^ - X.  .0  ) - (1  - X)02  , j - 1 J . (A.  11) 

j j \ 6J  J / 6 j 


Equation  (A. 11)  is  a quadratic  equation  in  1/ 6 ^ . Only  the  positive 
root,  however,  satisfies  the  nonnegativity  conditions  for  6. 

The  necessary  and  sufficient  condition  for  the  existence  of  a 

A 

unique  solution  to  this  system  of  equations  with  all  6^  > 0 is  that 

det  |q|  4 0 

where  Q * Y'MY,  M is  the  T * T idempotent  matrix  I - X(X'X)  X',  and 
Y is  the  T x J matrix,  which  assigns  the  values  of  Y^  to  separate 
columns  according  to  subgroup.  Thus, 


11 


V 


12 


0 

0 


T22 


V 
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Sorae  of  the  implications  of  this  condition  are  discussed  in  Section 
III. 

Properties  of  ML  Estimates 

Asymptotic  properties  for  the  multi-scale  model  can  be  defined  as 

Tj  ->  00  for  some  or  all  j , 

J ->  oo  , 


The  usual  theorems  for  ML  estimates  apply  only  where  the  number  of 
parameters  is  fixed  or  at  least  hounded.  Since  the  number  of  param- 
eters in  0 is  J + K + 1,  it  will  remain  fixed  only  as  the  number  of 
subsets  remains  fixed.  Hence  we  are  able  to  define  consistency  only  as 

T.  -►  oo  for  all  j j , # 

We  conjecture  that  with  certain  restrictions  on  the  data,  ML  estimates 
are  consistent  (in  this  sense),  joint  asymptotically  normal,  and 
asymptotically  efficient. 

The  literature  on  the  asymptotic  properties  of  ML  estimates  (see 
LeCam  [16])  suggests  as  an  estimate  of  the  dispersion  matrix  of  the 
parameters  in  the  limit 


lim  »T  (0  - 0)  =■  r_1(0)  , 
T -*■  <®  , 


A. 12) 


where 

T(0)  = E — • 

90^ 

An  estimator  of  the  asymptotic  distribution  of  the  ML  estimates  can  be 
calculated  from  the  matrix  of  second  partial  derivatives.  In  particular 
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where 


'll 


9 log  L 

2 

36 


^2  (X'X) 


(A. 13) 


(A. 13.1) 


F 92  log  L 
3636 


22 


3 log  L 

362 


A33 


3 log  L 

2 2 

3(o  r 


and 


(A. 13. 2) 
(A. 13. 3) 

(A. 13.4) 

(A. 13. 5) 
(A. 13.6) 


~2  -1 

The  value  0 (X'X)  will  be  recognized  as  the  estimate  of  the 
variance-covariance  of  the  6 coefficients  in  the  classical  normal  re- 
gression model.  The  need  to  adjust  these  estimates  to  take  account  of 
the  presence  of  the  vector  6 depends  on  the  inverse  of  the  matrix  r(0). 
If  A^  * A^  * 0,  then 
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Thus,  the  usual  estimators  of  the 
provided  by  T_1 (8)  if  and  only  if 


variance-covariance  matrix  will  be 


A = lim  92  log  L 

12  Tj  " 3636  = 0 ' (A. 14) 


In  this  case  lim  T'1  W is  a block  diagonal  matrix,  and  the  var  /r 
8)  is  merely  -a  A^  . In  general,  condition  (A. 14)  uill  „ot 
hold.  Suppose  the  model  (A.l)  and  (A. 3)  is  transformed  to  replace  6 
With  its  value  as  determined  by  the  side  condition: 


6 


1 


(A. 15) 


To  simplify  the  problem  further 
tain  n observations  (T  = nJ) . 
tive  is 


, we  have  assumed  that  all  subsets  con- 
The  expected  value  of  the  partial  deriva- 


E iC-log  L 
38 


2 


(Xjx^ 


XftS) 


(A. 16) 


for  all  j.  The  only  way  for  these  values  to  be  zero 
values  is  for 


for  all  parameter 


XiXl  = 


= X?J 


(A. 17) 


Hence  if  the  raw  moment  matrices 


tend  to  equality  in  the  limit, 


then 


E 


9 log  L 

3638 


= 0 


(A. 18) 


and  the  asymptotic  variance  of  8 is  merely 
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2 

§-  (X'X)_1  . 

Thus,  the  "t-scores'‘  in  the  regression  program  would  require  no  special 
adjustment  (except  for  degrees  of  freedom)  in  this  special  case.1 

The  addition  of  the  vector  6 to  the  classical  linear  model  should 
in  general  increase  the  variance  of  0.  However,  this  cannot  be  demon- 
strated here  for  the  multivariate  case.  The  asymptotic  variance  of  0 
is 

2 

Var  (8  - 0)  = f-  (An  - A12A22"1A21)'1  , (A.  19) 

according  to  the  rule  for  invariance  of  a partitioned  matrix.  A , 

A12,  and  A21  are  as  previously  defined.  A22  represents 

E _ lli-og  L 
2 

The  diagonal  elements  of  this  matrix  are 

-j~2  (4n°2  + 3,x|x10  + e'x'x  gl 
o 6^  ' 1 1 / 


and  the  off-diagonal  elements  are 


~~~  ( 2n02  - 0'X*X.0  ) . 

O 6.6,  V 1 1 / 

j h 


The  adjustment  to  be  made  to  the  asymptotic  variance  of  0 can  be 
shown  exactly  for  the  case  with  one  behavioral  parameter  in  0 and  two 
subsets.  Using  side  condition  (A. 3)  to  replace  6^  &2  with  a single 

In  all  estimates,  we  reduced  the  total  number  of  degrees  of  free- 
dom by  2J  + K to  take  account  of  the  number  of  parameters  in  Of,  0,  6. 
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value  of  6 results  in  the  following  matrix  to; 


9 log  _L 

2 

36 


]• 


where  again  both  subsets  are  of  size  n: 


lim  Var  (6  - 6) 
T-x» 


Ex' 


a2  18  fT  2 2, 

1 ^ ^1X1  ” ^2X2 


8 ,r  2 „ 2 
6 ^lxl  " L2X2}  0 


4n  + 62£x2 

62 

0 


It  can  be  shown  that  the  limiting  value  of  the  variance  of  6 U 


(A. 20) 


2 + B20x2 


TEx2  \ 2 + °2~2  -2 


(A. 21) 


The  expression  before  the  parenthesis  is  the  Var  8 in  the  classical 

normal  model.  The  expression  in  parentheses  takes  on  a minimum  value 
1 where 

~ 2 2 
^1X1  _ \x2 
T/2  = T/2  ' 


Where  the  variances  of  subsets  1 and  2 differ,  the  Var  6 takes  on  larger 
values.  For  8=1  and 


r 2 v 2 

£1X1  Q V2 

T/2  ~ y t/2 


lim  Var  8 is  nearly  twice  as  great  as  where  the  subset  variances  are 


equal. 
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LEAST  SQUARES  ESTIMATES 

LS  estimation  minimizes  the  sum  of  squared  residuals  subject  to 
the  side  condition  in  the  vector  6.  In  particular, 


S 


2 


-n 

j i 


" 1 in  “XijB)  +XG(6)  * 


The  first-order  conditions  with  respect  to  6 and  6 are 


(A. 22) 


-2 


Vu6 


and 


+ X 


3G(6) 


(A. 23.1) 


(A. 23. 2) 


(A. 23.1)  differs  from  the  same  condition  for  ML  estimates  (A. 8.1)  only 

by  a constant  (-1/2 0 ),  and  hence  yields  the  same  conditional  estimates 
for  8. 

Under  side  c ndition  (A. 3)  LS  and  ML  estimates  of  B and  6 are 
identical.  The  firsi-order  conditions  for  LS  and  ML  are 


(A. 24.1) 


(A. 24. 2) 


Inasmuch  as  O is  constant  for  all  j,  then  (A. 24.1)  and  (A. 24. 2)  differ 
only  by  a constant  and,  thus, 
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A*  = 2(1  - A )02  . (A. 25) 

it  ^ it  * 

Moreover  6 ■ B and  6 =6.  Asymptotically,  LS  estimates  have  all 

the  properties  of  ML  estimates  plus 

, * 2 

plim  A = 20  (A. 26) 


It  can  be  proved  that  for  ML  and  LS  estimates  to  be  equivalent 
the  side  condition  must  be  in  the  identical  form  of  (A. 3).  Take  any 
side  condition  G(6)  = 0 that  is  strictly  separable  in  the  6 . The  ML 

A J 

condition  for  6,  is 


(A. 27) 


(A. 28) 


n * * it 

The  conditions  are  equivalent  so  that  B = B and  6=6  under  some 

weak  conditions  if  and  only  if  the  right  sides  of  (A. 27)  and  (A. 28)  are 

/\  A 

the  same  for  every  subset  and  do  not  depend  on  any  element  of  B or  6. 
Thus,  at  most  we  could  have 


- £ (G)  . (A.:9) 

TJ 


But  since  G = 0,  then  f(G)  = c , a constant. 

o 

separable  into  G^(6^)  + ...  + Gj(6j),  9G/96j 
Thus , 


Since  G is  strictly 
can  be  written  as  Gj . 
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/\ 


c 

o 


(A. 30) 


Simple  integration  yields 


c T. 

° j 


In  6 . + C, 


(A. 31) 


where  both  cq  and  are  arbitrary  constants.  Thus,  for  LS  and  ML  to 
be  equivalent,  we  must  have 


J 

G * co  l Tj  ln  + C1  • (A.  32) 

j = l J J 

In  (A. 3)  c = 1 and  C,  * 0. 
o 1 

For  a function  G not  satisfying  (A. 30)  either  for  small  samples 
or  in  the  limit,  then  LS  and  ML  estimates  are  not  equal  in  the  limit 
and  LS  yields  estimates  that  in  general  are  inconsistent. 
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